Building Generative AI - Managing Systems Over Time (Part Two)

(Editor’s Note: Part 1 appeared on August 25 – https://www.architectureandgovernance.com/applications-technology/building-generative-ai-supporting-vector-data-part-one/)

By Dom Couldwell, Head of Field Engineering EMEA, DataStax

Implementing Generative AI involves putting together your data and models to create a working system that can answer requests with appropriate responses. However, once you have this in place, you will have to both keep your Generative AI running and find ways to improve your results over time. As this area is still so new, the approaches to manage and operate Generative AI systems are still in development themselves, so what areas should you be looking at?

Improving result quality with RAG

Generative AI and large language models (LLMs) can provide responses back to questions that seem to be credible. However, while the language may sound confident, the actual material itself may be lacking or inaccurate, or the LLM may offer up results that are not grounded in reality.

All LLMs will have a cut-off date where their training stopped and they were made available. For example, OpenAI’s ChatGPT was trained with data up to September 2021, so it does not tend to include accurate material after that date. Retraining the model itself is very expensive and not suitable for the vast majority of organisations, so we have to use a different approach to improve responses.

To improve, we have to look at how we provide more context for the Generative AI system to work with. This data can come from multiple sources, from the classic data management systems used to store operational company data through to Knowledge Management systems and analytics platforms like Data Warehouses. The aim here is to ensure that the LLM interacts with this additional data in a standardized way, which can make it easier to manage and secure the data and provide the right results back to users.

To achieve this result, we can use retrieval augmented generation (RAG) to blend multiple sets of data together for the LLM to use in formulating a response to a user query. Your own data can be turned into vector embeddings that can then be compared to the query, in order to find responses that are semantically similar. These embeddings are then used by the LLM to generate the response to the user.

This can be sets of your own data around specific topics and areas of expertise, such as information on business tasks or use cases. It can also include customer interactions that can be used to remember interactions history with each customer. This is useful when you have customer relationships that you expect to last a long time and you want to keep those transaction records consistent. At the same time, you may have to consider whether these interactions will include personally identifiable information and any governance requirements that have to be met. Adding your own vector data can improve the quality of the response, for example referring to historical interactions or previous purchases.

For enterprises that handle sensitive personal information or have governance fulfilments to bear in mind, using your own data with LLMs will require you to analyse your security and privacy approach and that you have all the right governance in place around that data. As you store user prompts, chat history and context data, you will have to check that this takes place in a controlled environment for compliance. If you use an LLM as a service or in a client controlled environment, then you will also have to check that your data is kept separate.

Supporting developers around generative AI applications

Alongside improving the quality of Generative AI responses over time, we have to make it easy to integrate those results into our applications and services. Whether it is a single interface or a full augmented agent service, the user will interact with the application and the large language model that are in place. For developers, this will mean connecting to the LLM and providing results back. However, with the potential to use multiple LLMs at the same time, integrating directly with each LLM represents a potential management overhead. To solve this potential headache and make it easier to interact, developers can use language model integration frameworks like LangChain and LangStream instead.

LangChain integrates with other components of the AI landscape, from data storage or documents through to code analysis and other tools. Alongside LangChain and LangStream, LlamaIndex provides a data framework for connecting custom data sources to large language models. Underneath services like LangChain and LlamaIndex there will be the databases and data sources used to provide data into the generative AI application. However, connecting and managing each of these data sources is another source of overhead, as drivers will be needed to convert transaction requests between the database environment and the services above them.

LangChain provides you with one standard interface for many use cases, which can simplify interactions with multiple libraries and make inferences from data output rather than the code itself. Additionally, LangChain can orchestrate a series of prompts to achieve a desired outcome within an application. This removes that overhead of connecting multiple different components to each other. Similarly, LangStream uses linked AI agents to process an input message, carry out a task, and then create a new message that can be passed to the next agent. This makes it easier to abstract away integrations and build applications.

For example, the open source database Apache Cassandra is popular for large transactional workloads where real-time performance is necessary. Alongside this, it is commonly used as a feature store for predictive AI services, such as Michaelangelo at Uber. Apache Cassandra can be used for vector data based on Apache Lucene. The open source project CassIO (www.cassio.org) can make it easier to connect LangChain or other LLM framework tools into the vector database layer provided by Apache Cassandra. In this instance, CassIO acts as a mediator between your application, any frameworks like LangChain or LlamaIndex, and any existing Cassandra databases that you have in place.

This approach makes it easier to manage data in one place, as you can use your vector database alongside your transactional data for more efficient and effective data management. However, you may also choose to use CassIO directly in your application, particularly when you are handling vector data on items other than text.

Thinking ahead around generative AI infrastructure

To make the most of Generative AI, we have to make LLMs smarter. We can achieve this by providing more context in terms of more recent data and by using prompts to make the response more personalised. In order to get ready for this, you’ll need to vectorise your data to deliver the additional context and prompts.

To build and operate your own Generative AI system, running a vector database for storing your own data as embeddings will be a key first step in the process. On top of this base, you will also have to consider data privacy and security for that data over time, just as you would with any other PII, IP or other sensitive information. Using techniques like RAG can provide more control over how that data is managed and how it can be used to improve results too.

Dom Couldwell is Head of Field Engineering EMEA at DataStax, a real-time data and AI company. Dom helps companies to implement real-time applications based on an open source stack that just works. His previous work includes more than two decades of experience across a variety of verticals including Financial Services, Healthcare and Retail. Prior to DataStax, he has previously worked for the likes of Google, Apigee and Deutsche Bank. www.datastax.com