Confluent - ‘scalable generative AI requires a different kind of data infrastructure’

Derek du Preez Profile picture for user ddpreez October 3, 2023
Confluent’s generative AI pitch is somewhat different to other vendors in the market. Its focus on the fundamentals of making data an asset is worth considering.

Data analysis, SEO vector isometric background. Optimization process of internet search results for online visibility of website. Magnifier and open laptop with charts and graphs on the screen © ZinetroN - Shutterstock
(© ZinetroN - Shutterstock)

Confluent’s AI pitch, as it stands, is that enterprise buyers need to rethink their data infrastructure if they want to take advantage of generative AI at any sort of scale. Instead of pushing generative AI models out to its customers, Confluent is instead hoping that by offering them a solution to make their data usable, well governed and trustworthy - whilst giving them choice and access to AI technology partners - it can play a fundamental role in getting customers to their desired outcomes. 

This is somewhat of a different approach to what we’re seeing elsewhere. This conference season has seen a slew of vendors position themselves as the generative AI solution of choice for enterprise buyer needs. It’s interesting that just a few months ago, generative AI wasn’t even on most of their radars. 

Technology moves quickly, however, and it’s been impressive to see the speed at which a lot of these enterprise players have brought large language models (LLM) and other generative AI solutions to their customers. That being said, customer references are still slim on the ground and much of what we are seeing is grandstanding in an attempt to quickly gain market share. 

With this context in mind, it was with interest that I spoke to Andrew Sellers, the lead of Confluent’s Technology Strategy Group, which is focused on advising the company on product development and understanding customer needs over the mid to long-term. Unsurprisingly, Sellers and his team have been considering the impact of generative AI on Confluent’s business, given that data is the core of what Confluent does. 

By way of background, Confluent provides a commercial offering of Apache Kafka as a ‘data streaming platform’. This means connecting, processing, governing and sharing data streams in real-time. Its pitch is that companies that rely on data stored in databases, using batch processing to gain insights and develop applications, aren’t in tune with a world that has moved to always on, real-time data environments. And it’s Confluent’s job, as it sees it, to make these real-time data streams more usable for the enterprise. 

But how does data streaming lend itself to generative AI adoption in the enterprise? Sellers argues that a data streaming platform requires an organization’s data infrastructure to be reusable and trustworthy - which should be a prerequisite if buyers want to do more than just piece together one or two generative models. Sellers said: 

We've always felt like AI is three to five years out. This moment feels different. And I think a big part of it is, with the foundation models, you sort of have to treat them like black boxes. But in a sense, it's actually quite liberating, because now you don't need the army of PhDs in statistics to run this thing. 

But the data engineering challenges still remain. And so a lot of what we look at when we talk to our customers is more about a readiness for AI from a data perspective. This is why I think with data products, it's very timely, because the idea of data becoming this trustworthy, reusable, discoverable asset, is I think most of the battle in terms of the readiness for that. 

And certainly, if someone's going to build their first generative AI application, they can kind of hack that together with the bespoke operational data stores or whatever. But if you want to build your 10th or your 100th, I think that's where the really good businesses, that are going to compete effectively, that's where it's going to need to go. These technologies need to be incorporated into everything that you do. 

You need a scalable, repeatable pattern for doing that. And that's where I think data streaming can work so effectively.

Can’t exist in a vacuum

Confluent’s product pitch for AI was announced last week at its annual user conference in San Jose. Whilst it did make some announcements about how it would use generative AI to make its platform easier to use for customers, the bulk of what was announced was about integrating with other technology providers, rather than trying to take ownership of the generative AI models itself. The key partnership announcements included: 

  • Technology Partners - Confluent is partnering with MongoDB, Pinecone, Rockset, Weaviate, and Zilliz to provide real-time contextual data from anywhere for their vector databases. Vector databases are especially important in the world of AI, as they can store, index, and augment large data sets in formats that AI technologies like LLMs require. 

  • Public Cloud Partners - Confluent is also building on its agreements with Google Cloud and Microsoft Azure to develop integrations, proof of concepts (POCs), and go-to-market efforts specifically around AI. For example, Confluent plans to use Google Cloud’s generative AI capabilities, with the aim of improving business insights and operational efficiencies for retail and financial services customers. And, with Azure Open AI and Azure Data Platform, Confluent is planning to create a Copilot Solution Template that allows AI assistants to perform business transactions and provide real-time updates. 

  • Services Partners - Finally, Confluent is launching POC-ready architectures with Allata and iLink that cover Confluent’s technology and cloud partners to offer tailored solutions for vertical use cases. 

Sellers said that Confluent wants to be a platform of enablement for its user base and that it isn’t interested in building models itself. He said: 

For generative AI, the thing that we've really been working hard on for months now is that this stuff can't exist in a vacuum. The data streaming platform can only be helpful here, if it integrates with those technologies that enable generative AI. That's why the partnership messaging story was so important - because we don't want to build LLMS. We want to integrate with whatever you choose for that. And so for us right now, it's about finding the right technologies and creating the right seamless integrations for our partners. So not just connectors, but native integrations. 

And choice is key. He added: 

I'm very proud of the cohort we put together with vector stores and embedding services and the really popular generative AI orchestration frameworks. This technology is evolving quickly, and we don't want anyone to be locked into a particular LLM or something. Those are constantly improving. 

The wonderful thing about event-driven patterns is that it's all about decoupling teams and technologies and systems. By doing that, then you're not tightly integrated with this LLM, you can just make a new one, because it just hangs off a topic. You connect the new element to that topic, and you're done. It gets replayable. There's really no operational impact of switching. And that's the kind of thing we’re trying to make seamless right now.

Event streaming is a useful model

Sellers’ advice for enterprises that want to be competitive with generative AI is that they should do a rationalization of what data they have and start creating data products, so that data becomes an asset, is reusable and trustworthy. He repeats that if companies want to do more than ‘cobble together’ one generative AI use case, they need to make their internal data consumable. Sellers argued that this isn’t easy when using traditional data stores, but is well aligned with Confluent’s data streaming model. He added: 

That can be really challenging if you're not using data streaming because [data stores] all kind of work that way, where you're sort of locked into a data model. And data models come with trade-offs in terms of flexibility. Whereas when you specialize around consuming, making sure that the data can be consumed readily, you get to put those decisions off. 

And the other element to this is that generative AI typically relies on real-time contextualization. This is a slight difference to how we perhaps thought about AI a couple of years ago, which meant building a bespoke model and using it for some sort of analysis or understanding of data that was sitting at rest. The use of the AI was periodic, but generative AI has shifted us to seeing it as continuous and real-time based on up-to-date data. Again, Sellers sees data streaming and event-driven architectures, underpinned by data products, as key here. He added: 

With generative AI , because one of the best patterns we have is this prompt engineering, contextualization has to happen near instantaneously to deliver the reactive sophisticated experiences that customers have come to expect. 

And so that's why now it becomes even more incumbent on our customers and others, to bring these things to market in a real way. LLMs are great, but they don't really add any value if they don’t know anything about your data. So you've got to be able to contextualize it in a reasonable way. 

There’s a few generalizable steps that a lot of these applications have. There's that first step, which is data augmentation, where there's almost a staging in a vector data store or something appropriate like that. So we can check for relevance against embeddings and unstructured data, or just some sort of operational path to use the LLM as a reasoning agent. There's those sorts of inference steps, those chains of inference steps.

And because no one really trusts these things. There's always a post-processing step that is going to validate that this conforms to some kind of business process or something. And each of those steps I just described are very effectively implemented as an event driven pattern. 

And so that's why I think there's a lot of AI washing right now and why I'm really excited to be here. We don't pretend to be like LLM builders or something like that. But there's an important part of this story, where you need to think about how data moves and how you do it in a reliable and repeatable way. And that’s where we help. 

My take

An interesting pitch from Confluent and one that makes sense, in theory. It is fair to argue that enterprises have long struggled with stagnant, disorganized data and that they need to create an environment where the data becomes the product, becomes reusable and is governed in a way that makes AI use trustworthy. The challenge will be convincing buyers that they need to put in the work for what can be a challenging infrastructure change, when other vendors are offering quick wins (albeit, untested in many cases). That being said, I’m sure those that are willing to get the foundations right, will see the greatest results. This time next year we should have a better idea of how this is playing out for buyers and where their budgets and priorities will land. 

A grey colored placeholder image