Just a year on from its debut, ChatGPT is a phenomenally disruptive technology. CIOs know it can’t be ignored, but it comes with risks. What is the best way forward here?
The underlying technology that powers ChatGPT is called a Large Language Model (or LLM). LLMs like ChatGPT are based on a “novel transformer architecture” that allows a system to generate responses rather than predict them as prior NLP (natural language processing) approaches did. Accordingly, LLMs are very large deep-learning models trained on vast amounts of data to be able to generate new content.
LLMs can also achieve remarkable results, and have even been able to pass the Turing Test, meaning these systems can pass as offering human-level responses. However, one of the issues plaguing GenAI is hallucinations, where generative outputs can’t be fully trusted. Another challenge is the high costs associated with training and scaling the technology, difficulties associated with audits, and a corresponding lack of transparency on how the system arrived at its (seemingly plausible) answers.
In contexts where compliance or safety are important, we can’t accept the answers we might get from LLMs at face value. This poses a challenge for enterprises aspiring to leverage LLMs more extensively. Many LLM issues can be avoided by combining them with a knowledge graph, vector search, and retrieval-augmented generation (RAG).
Unlocking useful insights
Let’s look at each of these components individually. A knowledge graph is an information-rich structure that provides a view of entities and how they interrelate. It enables us to represent these identities and connections as a network of curated, deterministic, and verifiable facts, forming a structured graph of collective knowledge. And increasingly, in the context of LLMs, a useful application of them involves integrating a knowledge graph with an LLM and establishing a fact-checking layer to improve accuracy.
The beauty of a knowledge graph is it can provide context that goes beyond the scope of an LLM. In particular, it’s easy and cost-effective to add data to a knowledge graph in real time, whereas retraining an LLM that is just weeks or months old with new facts is prohibitively expensive.
The glue which developers are using to join LLMs with knowledge graphs is embedded vector search. While AI practitioners tend to use vectors to perform neighbourhood searches in a high-dimensional space, those same vectors can be used as keys into the curated, deterministic knowledge graph. This means you can relate terms from the LLM into the knowledge graph and then use graph algorithms and queries to find accurate data to enrich the response.
Capping all this is the increasing use of a third LLM-enhancing technology called retrieval-augmented generation (RAG), as exemplified by language model integration frameworks like LangChain. The RAG pattern formalises the interactions with the LLM, using the knowledge graph to improve fact-check responses from the LLM before they reach the user. Use of RAG with a knowledge graph can increase the richness of fact-checking of the LLM.
For example, Basecamp Research, a UK-based biological data company, is revolutionising protein discovery from biodiverse natural environments. It uses AI to search for commercially useful molecules and biomes and has built the world’s most extensive natural biodiversity knowledge graph. The company understands the importance of linking LLMs with graph technology, and is now upgrading to a fully LLM-augmented knowledge graph to overcome the limitations of GenAI.
Basecamp Research isn’t the only trailblazer regarding the LLM-graph twinning concept. A multinational energy company uses a knowledge graph with ChatGPT in the cloud to create an enterprise knowledge hub too.
This year, we will see a lot more of this kind of combination as enterprises look to create intuitive and informative LLM-based decision support systems.