Integrating LLMs with Real-Time Data
The integration of Large Language Models (LLM’s) with IoT or real time streaming applications can greatly benefit knowledge workers by enhancing their productivity and providing access to relevant and summarised information. The following paper outlines how this might be achieved and specifically how Vantiq helps with the incorporation of LLM knowledge and user interactions as enhancements to existing enterprise workflows.
The Limitations of LLM’s
Knowledge workers must process and filter vast amounts of information to perform their jobs. For instance, a field service technician may need to understand and fix over 100 different interrelated pieces of equipment, while a doctor must combine dozens of symptoms, drug side effects, and diagnostic information to reach a potential patient diagnosis and care plan. Much of this information exists in the form of manuals, textbooks, knowledge bases, and other sources. Combining this information with current data is essential for diagnosing machine faults, and determining steps to resolve them, or to help reach a potential patient diagnosis.
Tools like ChatGPT and the Large Language Models (LLMs) that underpin them are increasingly being used by knowledge workers. However, ChatGPT and LLMs have several limitations. For instance, while LLMs excel at understanding and producing new content, they can only produce content based on the corpus of data on which they were trained. Although they have been trained against a vast amount of information, this information, circa 2021, was trained on publicly accessible information such as Wikipedia, GitHub, etc. and thus have little or no domain specific intelligence. Furthermore tools like ChatGPT do not have access to real-time or current information about a piece of equipment or environmental data. They also lack access to historic information, as this often resides in closed, non-searchable sources and is updated continuously.
Another limitation of LLMs is the context size. The context or prompt size varies depending on the model, generally ranging from 2K tokens (equivalent to 2-3 pages of text) to 32K tokens (40-50 pages of text). This context size includes both the queries, any context information, and the response generated by the LLM. Even with the larger context sizes (32K), it is impossible to provide a complete knowledge base within the context.
Privacy is another concern. Some of the information that may be sent to a tool like ChatGPT could be very sensitive and private. Therefore, entering patient information or a client’s financial information into tools like ChatGPT is unfeasible. As an alternative, solutions like private LLMs may prove useful. Currently, there is a lot of focus and discussion around LLMs like GPT and BERT, but over time we will start to see very specialized LLMs trained on industry and domain-specific information and can ensure sovereignty of sensitive data that is included in a prompt.
Training custom LLM’s is not always going to be practical or necessary. Training an LLM is a huge undertaking from a cost, time and skills perspective, and a typical enterprise organization may not have this capability. In some cases, it may be advantageous to train a custom LLM where language and context is very specialised – such as in the healthcare or legal markets – however there is still the inability to effectively keep the LLM up-to-date. It is estimated that two papers are published into the medical field every minute (https://www.nature.com/articles/nj7612-457a ) and there are more than eighty new law cases every hour (https://www.uscourts.gov/statistics-reports/federal-judicial-caseload-statistics-2021). Therefore, even when using a custom LLM it is still important to be able to combine this with an up-to-date knowledge base. In many cases a custom LLM will not be necessary if existing LLM’s are sent the appropriate contextual knowledge when prompted using a process called semantic search – to be described in detail below.
To support these styles of applications and situations, various technologies and techniques can address these issues. We need to combine existing reference material and knowledge bases, real-time status information about a piece of equipment or patient, historic information about the equipment or patient, and textual prompts from a user to guide the LLM to generate content specific to the situation at hand.
Incorporating Knowledge Bases with Semantic Search
Knowledge bases, reference text, and other forms of documentation often exist within an enterprise, and knowledge workers need to be able to search these. Traditional search approaches are keyword-based. With LLMs, a more productive approach is to use Semantic Search capabilities to gather information related to the question and present this additional context information to the LLM as part of the context/prompt. Semantic Search uses a mechanism to determine the intent and contextual meaning of the question, using this as the search query and finding matching concepts in the data source.
Populating the Database
The first half of the Semantic Search problem involves using the LLM to encode the knowledge base and documentation as embeddings. Creating these embeddings is very compute-intensive, so they are generally stored in a vector database for efficient retrieval. The Vector Database stores the embeddings and associated metadata, such as the source document for the embedding. Large documents are broken into smaller pieces or chunks so that the LLM is presented with more relevant information.
Retrieval from the Database
To retrieve any relevant documents/knowledge associated with the question/prompt from the user, the question is encoded as an embedding and a similarity search is performed against the Vector Database. This process retrieves the documents/chunks that may have some relevance to the question being asked.
Combine Question, Semantic Search Results, and Situational Awareness
At this point, we could simply add the results from the semantic search to the prompt along with the user question. However, we may wish to add more context information. For instance, we could add the last 10 minutes of sensor data from the malfunctioning piece of machinery, along with the semantic search results and the user’s questions, to generate a response from the LLM. We could combine the current patient’s vital signs with the results from the semantic search with the question. We could also add historic information, such as a patient’s medical records or the list of current drugs that the patient is taking, into the prompt sent to the LLM. This information probably doesn’t belong in the Vector Database as we often know this information ahead of time, and it’s not associated with the question but rather the context in which the question is being asked. Therefore, adding this information to a vector database is unnecessary and inefficient. It’s more efficient to look up a patient’s medical history or drug data from a relational database, or to look up the service history of the piece of equipment, as we know the patient’s name or the ID of the failing equipment.
Use Case Classifications
There are a variety of use case classifications that could use the above model. Asking an LLM a question is very compute intensive and relatively slow. LLM’s tend to respond in human speed rather than machine speed, responses to questions take anywhere from 1-30 seconds depending on the question and the amount of context information provided. LLM’s should not be used for detection of situations in real time streams of data but be used to provide knowledge workers information and to allow knowledge workers to refine and summaries information when needed.
Smart Notifications and Alerts
Typically, applications will send basic notifications to knowledge workers when situations of interest have been detected, this notification might contain details about the situation and possible links to further information. A Smart Notification would use an LLM in combination with real time situational information, one or more knowledge bases and potentially historic information to provide the knowledge worker with an initial diagnosis. The knowledge worker could then further refine or query the information provided.
Context-ware Assistance
Often knowledge workers will be reviewing patients or equipment on a routine basis. Context-ware Assistance could combine the current status of a patient or piece of equipment, along with historic information and potentially knowledge bases to provide the knowledge worker with a more refined view of the current status of a patient or piece of equipment.
Vantiq and LLM’s
Vantiq Real-time applications process vast amounts of streaming data to detect situations of interest. These applications may be orchestrating multiple AI and ML platforms, as well as existing backend applications, to detect situations of interest and bring them to a resolution. Vantiq has added several LLM-based capabilities to further enhance the capabilities of your real-time application, now including LLMs in the resolution of situations of interest.
Semantic Search Capabilities
Providing the ability to populate and query a vector database to support Semantic Search capabilities.
- Populating and automatic creating and storage of embeddings into a Vector Database
- Support for a variety of common file formats such as text, PDF, Word etc
- Ability to query and perform sematic search against the Vector Database and include/combine the results into a prompt.
Access to LLM Capabilities
Integration with a variety of LLM capabilities are included in the platform. These include the following.
- Integration with several commercial and open-source Large Language Models
- All the developer must provide is an access token, and the model can be integrated into your Vantiq Application
- Prompt Templates
- Construction of your prompts can be defined using a powerful templating engine to increase developer productivity.
- Easily combine real time information, historic information, and semantic search results with a question
These capabilities are integrated into Vantiq agile Visual Event Handler capabilities by the inclusion of several LLM and Sematic Search related Activity Patterns.
These capabilities can be combined with Vantiq’s existing capabilities to make building new applications and integrating these capabilities into existing applications.
- In-memory state management – This allows you to manage and maintain the conversational history in memory rather than in a database. This will greatly improve the performance of the applications. In-memory state can also be replicated across a cluster automatically thereby providing the reliability of databases without the performance cost.
- Collaboration capabilities – Vantiq’s various Collaboration capabilities can be integrated with the LLM capabilities.
- Ability to limit Collaboration instances to one per situation of interest.
- Integration with Mobile Application
- Integration with chatbot capabilities
The following outlines a simple Visual Event Handler that manages various patient vital signs looking for situations that indicate that a patient needs immediate attention. Once a situation has been detected an initial prompt is defined by combining the status of the patient, their patient history and any information from a knowledge base that may help a doctor with their diagnosis. An initial question is posed to the LLM by the system and immediately sent to the doctor. The doctor is then able to ask the system follow up questions.
Conclusion
The integration of Large Language Models (LLMs) within real-time applications has the potential to greatly enhance the productivity of knowledge workers. LLMs, such as GPT, have become increasingly valuable tools for knowledge workers by understanding and producing new content. However, there are limitations to LLMs, including their reliance on pre-trained data and limited context size.
To overcome these limitations, incorporating knowledge bases through semantic search capabilities can provide additional relevant information to LLMs. By encoding knowledge bases and documentation as embeddings in a vector database, knowledge workers can efficiently retrieve relevant information for their queries. Combining semantic search results with real-time status information, historical data, and user prompts can generate more context-rich responses from LLMs.
Vantiq, a real-time application platform, offers capabilities to support LLM integration and semantic search. The platform enables the creation and storage of embeddings in a vector database, allowing for efficient querying and semantic search. It also provides access to various LLM capabilities, integration with commercial and open-source LLMs, and the flexibility to construct prompts using a templating engine.
By integrating Vantiq’s LLM and semantic search capabilities with existing features like in-memory state management and collaboration tools, developers can build powerful real-time applications. These applications can provide knowledge workers with refined views of patient or equipment status, offer smart notifications and alerts, and assist in decision-making processes.
Overall, the combination of LLMs and real-time applications, facilitated by platforms like Vantiq, holds great promise in enhancing knowledge workers’ efficiency and providing access to relevant and summarized information in a timely manner.