Of all the challenges, applications and real-world use cases that Artificial Intelligence (AI) has been directly involved with, one of the most important and prominent is the role that it can play in managing content for enterprises. 

Content is an integral element of modern business strategy, used to help enterprises design and launch products, drive marketing, increase sales and much more. In fact, content is essential to effective customer experiences (call centre operations, customer self-service, etc.) as well as in numerous back-office processes (e.g. claims processing, underwriting, etc.). But content has long been a challenge from an information management perspective. It can be nearly impossible to find due to inadequate and inconsistent metadata attributes, limited search functionality within core business applications, and the simple fact that it is often scattered across any number of disconnected different systems and repositories.

With legacy content management systems proving to be less effective at managing modern content volumes and types, more modern content management systems providing AI capabilities have emerged to fill that void. These modern solutions can extract critical data from content and, in doing so, transform content into intelligent information that can be easily found, readily used to perform work, and always accessible.

An avalanche of content

In a recent Nuxeo survey of UK financial services companies, 80% of respondents indicated that their systems were not fully integrated and their organisations had an average of nine different content management systems in place. This is most likely very similar across other sectors, and as the importance of content has grown, so too has its volume and variety, making it even harder to manage.

In fact, the volume and types of content are growing at an unprecedented rate. Many enterprises have accumulated billions of documents and scanned images over the last 20 years. But today, some are looking to capture and manage as many as 500 million new objects per month. That’s an astonishing amount of new information and one that is only likely to increase rather than slow down. 

Extracting information from content and entering it into fields and tables is work that people inherently don’t like to do. Doing this work across 1000s or even 100s of 1000s of new documents, every day, is challenging, expensive, and difficult to do with consistent accuracy. This is why so many organisations have struggled with enterprise content management (ECM) for so long – most traditional systems required extensive human input, which is both time-intensive and expensive.

The value of custom machine learning models

To manage this content and extract the maximum value from it, enterprises are turning to AI and, in particular, machine learning. Many technology firms, including Amazon, Google and Microsoft offer commodity AI services that can be leveraged for working with various forms of content. 

Many of these machine-learning offerings are focused on providing greater insight into and understanding of content, whether that’s text-based documents, photos and images, or even audio and video files. A lot of value can be derived from these generic models and services, particularly in performing routine tasks with high volumes of content. OCR, sentiment analysis, translation, or even transcription for audio & video files (i.e. speech to text) are all examples of generic machine and deep learning services.

But the real value of AI and machine learning can only be truly achieved when these models are trained with an organisation’s own data and content. A custom model — again, one that has been trained with an organisation’s own data sets — can produce much more accurate and contextual data about content which, in turn, is more valuable to the business. For example, with a custom machine-learning model, a user could identify specific products in photographs or videos, classify accident damage in automobile claim photos, identify vital corporate records and even detect potential fraudulent claims.

While it is still early days for this technology, some of the initial use cases for custom models include:

Data Enrichment – this is all about extracting data from content and using that data to make that content more accessible, more contextual and more valuable to the business. Enrichment can take on many different forms, depending on what type of content a business is working with.

Within a Content Services Platform, data provided by custom machine learning models gets applied to the object – an image, video, document, or other content types – as metadata which is indexed and can later be used to find and retrieve that object. This is the difference between a generic AI model auto-tagging a car in a picture as blue, compared with a custom model that provides more valuable information such as model, manufacturer, and even trim level. Such data can also be used to trigger workflows and initiate processes and can be passed to other, integrated systems. 

AI can also be used to enrich traditional content, like documents and scanned images. Many financial services firms for example, have large volumes of existing TIFF images that they want to convert into PDF documents, so they can be indexed and searched based on the entire context of the document. Users can then search for content within a PDF document, perhaps to identify a particular word, phrase, or even a contractual term. 

To accomplish this, a public AI service like Amazon Textract can be used to OCR the content from the TIFF and extract and index the text. A transformation service (not AI) is used to map this text back to the original image and convert the image to a PDF document. This document is then ingested into the Content Services Platform and properly indexed for search and access.

Automation – another use case for machine learning is to help companies better automate critical business functions and processes. Most enterprises still process millions of paper forms every year, many of which are handwritten. A critical challenge is to first determine if the form has been completed correctly, before processing it.

This is highly labour-intensive and therefore also costly, with humans required to determine what type of form it is, and then validate that the necessary responses have been provided, signatures are in the right place, and if required, that confidential customer information is present and correct.

Machine learning models can do all this formerly manual work. Organisations can capture these forms from a variety of sources – fax, email, or physical forms – and convert them into digital images/ documents. Machine learning is then used to correctly identify the different forms and perform the necessary validation on the provided information, in most cases before a human even looks at the document. For some large financial services organisations, for whom over 60% of their forms are still completed by hand, this simple bit of automation can save millions annually.

Insight – for many organisations, the insight contained within their content is actually vastly under-utilised. AI can help extract insights and intelligence from existing business content and deliver that insight back to users so they can make more informed decisions.

One example is insurance. Fraudulent claims remain an issue and it is not uncommon for claimants to use the same accident photos in multiple claims. For a large P&C insurer, with thousands of claims processors, the chances that the same claims processor will handle two claims with the same accident photos are extremely small. Yet AI can help detect insurance fraud by leveraging machine learning to compare new claims photos with a vast repository — or Content Lake — of existing photos, quickly identify duplicates or near-duplicates, and then automatically launch a fraud investigation process.

As the volume and type of content in enterprises grows, so too does the need to manage that content more effectively. AI and machine learning can unlock the insight and value within content and applying these technologies to content will become one of the more widely deployed AI use cases over the next few years.