Reducing LLM hallucinations with trusted data

Large language models (LLM) have a range of uses, but LLMs or AI hallucinations pose great risks to organisations, and can lead to mistrust amongst consumers. LLMs are trained on models which are used to predict the next word based on patterns in their training data, and are not “intelligent” or thinking beings as sometimes portrayed in popular media. Models can generate outputs that seem factual but are wholly fabricated. Not only that, the data that is used to train these models may also be of low quality as LLMs cannot conduct any fact-checking on their own. Consequently, with faulty inputs, end users cannot confirm whether the model provides trusted information.

So, how can organisations address AI hallucinations? With today’s technology, it is impossible to entirely eliminate hallucinations, although organisations can significantly reduce them by improving the quality of data used to train them. A strategy that helps mitigate the problem includes applying a robust data unification framework. This approach improves data quality and trustworthiness, creating a solid foundation for responsible AI.
Implementing this strategy should follow a series of phases and steps.

Tailor LLMs for business specific-knowledge

As organisations use LLMs to enhance their operations and support customer service, it is critical for them to reduce hallucinations. Businesses should review the data which is training the model to introduce sources that can produce personalised business-specific responses. This can be done through Retrieval Augmented Generation (RAG), which involves creating a model to retrieve information from a database of company information to feed LLM with informed responses.

However, RAG alone will not be effective in suppressing AI hallucinations. To ensure that the RAG implementation is successful, the data used in the system should be of high quality, accurate, complete, and relevant. As such, businesses must invest in a robust data management and unification system, which includes quality controls and the ability to make real-time updates.

Further, organisations can use graph augmentation. This involves utilising a structured knowledge graph of enterprise-wide entities and relationships within a business, which enables company-specific terminology and facts to be included in the outputs. Similar to RAG, the effectiveness of graph augmentation is dependent upon the quality of the data used, but it adds another layer of control and assurance to the quality of AI-powered responses.

Leveraging trusted and unified data for LLMs

Modern cloud-native data unification and management approaches are key when addressing the challenges of training LLMs. Some of these approaches include master data management (MDM), entity resolution and data products, and collectively, these are the essential parts of creating business-critical core data. Through unifying data from a number of sources and feeding this directly to LLMs in a constant, real-time and accurate way, these systems become key pillars to ensure that the models produce reliable outputs.

One of the essential aspects of creating a robust data unification and management solution is leveraging canonical data models. As these models allow businesses to unify data from several sources across different entities seamlessly, they create consistency and accuracy. It is also important for these models to scale due to the vast amount of data that is generated and consumed so the LLM can constantly access the most up-to-date information.

As well as scalability, application programming interface-driven (API) performance is essential for real-time data availability and automation of these models. Through the security-compliant APIs creating a seamless integration between the data unification platform and the LLMs, there is the ability for rapid data access and processing. This also curbs the chance of the model generating inconsistent outputs and so increases trust amongst users.

Creating successful LLMs now and into the future

As AI continues to become deeply integrated into our lives, there is a great need for these models to be based on trusted, reliable, and transparent data. The recent passage of the EU AI Act seeks to hold companies to higher standards. Organisations will need a foundation of trusted, high-quality data to ensure compliance with the new law.

Businesses must also continue to comply with rigorous data privacy laws. In Europe, they could face a fine of up to €10 million, or 2 percent of the organisations’ global annual revenue from the preceding financial year, depending on which amount is higher, according to GDPR.EU. While data privacy compliance is crucial, the erosion of customer trust from misleading or biased data sets being used in these LLMs, will have a long-term impact on the reputation of the organisation and its bottom line.

There is no shying away from the significant role AI and data play in business operations, but it is of utmost importance that LLMs are built on a robust data foundation. Through strong data unification and management strategies, businesses are much better placed to create trusted and up-to-date LLMs. Organisations must lay the groundwork now in order to reduce hallucinations and successfully reap the benefits from AI to reduce hallucinations and successfully reap the benefits from AI to improve operations and customer service practices.

"Ansh Kanwar is Executive Vice President (EVP) of Technology, Product and Strategy at Reltio. He oversees Reltio’s global software engineering, product management, and technology operations.

Ansh has extensive experience in product management, software development, product marketing, security, cloud computing, and technology operations. Over the last 23 years, he has held numerous senior technical and management roles, including at Citrix Systems, where he served as Vice President, Technology Operations, and LogMeIn, where he served as Chief Technology Officer, and General Manager, Products and Technology at Onapsis.

Ansh is a public speaker and loves discussing Data Unification and Management, AI in Data, Data Products, the Ethical use of Data, and building at scale SaaS products. He has a Bachelor’s in Computer Engineering from Delhi University, an MS in Electrical and Computer Engineering from the University of California, Santa Barbara, and an MBA from the MIT Sloan School of Management. He lives in Cambridge, MA.

AI Readiness - Harnessing the Power of Data and AI


Related articles

CIOs and CISOs Battle Cyber Threats, Climate, Compliance

CIOs and CISOs face unrelenting pressure from three massive...

Discover the Power of On-premise Cloud Innovation

For most organisations, the shift from on-premise to the...

The AI Show – Episode 8 – Theo Saville

In episode 8 of the AI Show, our host...

The Data Conundrum: How sustainable is its future?

In this article, Dan Smale, Senior Service Owner of...

Adopting open architecture for robust data strategy

As the world's economy grapples with continuous challenges and...