Data streaming explained: Why it is now indispensable!
Imagine the data in your company were gold coins hidden deep under the ground. To mine it and unlock its full value, you need not only the right tools, but also a clear strategy of what you want to do with it. While for gold coins it's the right investment opportunities, the key to efficient data utilization, data-driven decisions and the use of artificial intelligence (AI) lies in the availability of up-to-date and consistent data. Unfortunately, this is exactly where many companies fail - either the data quality is insufficient or the necessary automated data flows are missing.
Data streaming offers a solution for exactly this. But what is it anyway? Think of a flowing river that constantly provides fresh water. Similarly, data streaming ensures that your data is always up to date and immediately available. This is crucial because the actuality of data plays a crucial role in many areas. Stock quotes are a good example. They can either skyrocket or crash within a matter of seconds. Old quotes quickly lose their value, and only up-to-date data enables precise decisions to be made.
Another example for using so-called real-time data can be found in customers visiting a website. In order to recommend the right product at exactly the right moment, the relevant data must be analyzed in real time. With data streaming, it is possible to reach customers exactly when they are most receptive to offers.
Real-time data is becoming mandatory
However, data streaming not only provides companies with the tools to retrieve the hidden treasures of their data and turn them into money. In fact, the modern technologies associated with data streaming have changed our entire way of thinking. Especially when surfing with our smartphones, we want to see immediate results - preferably before we have even made a specific request. This happens through personalized insights into collected data.
Payment transactions are another example. Nobody wants to wait until a nightly batch job has processed the transactions, as is still common practice in many companies. Instead, customers demand real-time responses and immediate availability of balances and transaction overviews. In Switzerland, however, this is likely to be a thing of the past from August 2024 at the latest: The introduction of instant payments is about to hit the banking world. These instant payments will require all types of validation and fraud detection to be carried out in real time - a perfect use case for data streaming.
But it's not just in banking that things are changing: the entire world is continuously moving towards real-time processing. This means that data is processed at the exact moment the event occurs. The days of collecting events and processing them later in batches are a thing of the past. About 15 years ago, LinkeIn laid the foundation for this with the open source solution Apache Kafka. This is now regarded as the de facto standard for data streaming and enables large volumes of data to be received, stored and processed in almost real time. More than 90% of the world's 500 largest companies now use Apache Kafka. Also in Switzerland, companies from various sectors - from banking and insurance, retail and transportation to public administration and industry - rely on Apache Kafka.
Why Kafka is a must for every company!
But if you think that only social networks like LinkedIn or big tech companies like Apple, Google, Meta, Netflix and Uber need such solutions due to their huge amounts of data, you're wrong. Even companies that only process a small amount of data benefit from Apache Kafka. They use Kafka primarily as a kind of data hub to exchange and distribute data reliably, consistently and cost-effectively between different systems.
Nowadays, this approach is essential for any company that has several central applications in use or has even built its own applications. Typical examples of such core applications include Avaloq and Finnova in banking, Syrius in health insurance and SAP and Abacus as widely used ERP systems. Salesforce and HubSpot as CRM solutions are also among the applications from which companies would like to access and combine data.
Decoupling brings order to the data chaos
Another reason to use Kafka as a data hub is to decouple the producers and consumers of data from each other. This avoids hard to maintain point-to-point connections between applications, which further improves the overall efficiency and maintainability of the system. Such decoupling makes it possible to avoid - or at least reduce - the dreaded "big ball of mud" or "spaghetti integration". Spoiler: This is not about food.
Imagine your organization is like a giant plate of spaghetti, where each piece of data is tightly woven with another. As time passes, this mess spreads and adjustments to one application suddenly have major and unexpected effects on the entire company. The result is a sort of domino effect, where a small change leads to a big bang and suddenly more changes become necessary. A real nightmare for IT departments.
With Apache Kafka, this can be mostly avoided. Thanks to Kafka's "pull approach", applications can retrieve and process data at their own pace. It is as if every machine in a large factory works autonomously and performs its tasks independently. If one machine fails or has a malfunction, it can simply be restarted and put back into use without having to stop the entire production process. After all, the raw materials and intermediate products, or in our case the data, remain available. Or to pick up on the spaghetti metaphor again: The sauce can easily be kept warm until the spaghetti is al dente.
Good to know!
Important data often remains hidden in the existing silos of a company's old legacy systems, such as ERP and CRM systems or the core systems of a bank or insurance company. This data is often laboriously exported, processed and transferred to a central data storage system (data warehouse) manually or through nightly automated ETL (extract-transform-load) processes.