With the advent of ever stricter data privacy and protection regulations – as well as stringent client requirements with regards to where their data resides – organisations have become finely attuned to the importance of storing data within specific geographic regions, lest they make a misstep that will put them out of compliance.

But this is where AI can potentially throw a spanner in the works. The data might be stored in a specific geographic region – but when an AI service is run against that data, where does that processing take place?

As new regulatory frameworks like the proposed EU AI Act come into play, it will become increasingly important for organisations to ferret out these types of hidden risks if they want to safely leverage AI – and asking a few key questions can help them safely navigate a path forward.

Storage over here, processing over there…

Sometimes – either for cost or operational efficiency reasons – vendors will have computer clusters in various locations around the globe that are in charge of running specific services on the data, whether that’s OCR, search indexing, or AI.

This approach can quickly lead to some potential data sovereignty pitfalls. Say that data is stored by the vendor in the EU, but their AI processing is handled by a cluster of servers in the United States.

Whenever someone wants to run AI on a document that is stored in the EU, the vendor will send it over to the United States, process it through the AI engine located there, and then send it back to the end user who’s doing the query.

This is all invisible to the customer, who is likely unaware that the data has moved and been processed in the United States. The vendor, meanwhile, can place a hand on the heart and swear that data is stored in the EU – even though it temporarily takes a brief journey across the pond anytime AI is run across it.

The lesson here? Customers need to make sure they’re asking their vendors the right questions.

Customers should always ask their vendor not only where data is stored, but where processing of that data takes place. Vendors who have placed not just storage servers but AI clusters across the globe – rather than in one specific geographic area – are best situated to ensure that data is both stored and processed in the same region, ensuring it doesn’t leave places it shouldn’t.

How many clouds, exactly?

As part of their due diligence, customers should also determine just how many cloud providers are being utilised to provide the overall AI service. Is it just one cloud provider? Or is it multiple cloud providers?

For example, maybe a vendor stores data on their private cloud, then hands it off to Large Cloud Vendor A, and then over to Large Cloud Vendor B for AI processing, before returning it to their private servers. That’s three different clouds that vendors are moving data across.

All three clouds may well be in the proper geographic region, but it becomes harder to pin down the location as more clouds get involved.

Additionally, increasing the number of cloud vendors increases the risk profile. The more companies in the supply chain, the greater the risk of something going wrong. All it takes is one weak link in the chain for a seemingly secure service to become compromised.

Customers should specifically ask any AI vendor they’re utilising how many vendors they are relying on to deliver the service. Knowing the full extent of the supply chain helps shine a light on the potential risk of that chain.

No prying eyes on sensitive files

Given the confidential information that so many of today’s organisations traffic in, one of the most essential things to clarify with their vendor is whether their data is being used to train the underlying large language model (LLM) that powers the AI service.

The answer should be an unqualified “no.” When the LLM is presented with a document and a question to ask of that document, the LLM should provide an answer and then forget both the document and the question. It should not be storing any of that information unless you have explicitly allowed it to.

Additionally, many of the large AI service providers have an “abuse monitoring” program that allows them to monitor the questions being asked of its AI engine, to ensure it isn’t being used in a harmful manner. It’s important for customers to ask whether their vendor is subject to that monitoring policy, or if they are exempt. At the end of the day, customers have a right to know who might be monitoring their traffic and the questions they ask of their AI, as that data trail provides a window into the sensitive content they’ve been entrusted with.

Steer clear of potential pitfalls

In the rush to embrace AI and make it part of their operations, there are critical areas that organisations can’t afford to overlook if they hope to steer clear of hidden risks. Fortunately, by homing in on the right questions during the due diligence phase, they can minimise potential risk around everything from data sovereignty to how many different clouds the data moves across, to who’s able to monitor the questions being asked of the AI. This careful approach empowers organisations to leverage AI confidently, ensuring better business outcomes while steering clear of any potential pitfalls.

+ posts

Paul Walker, as EMEA Technical Director for iManage, drives all manner of technical initiatives related to customer and partner organisations in the region. With over 20 years’ experience in the professional services sector, alongside a software development background, he advises clients on areas such as document management, knowledge management, e-disclosure and information governance.

Cloud Industry Forum presents TWF! 2024 state of the cloud report


Related articles

Start Bridging the Cloud Skill Gap Today

In today's rapidly evolving digital landscape, cloud computing has...

How Data Fabric Helps Address Multi-Cloud Sprawl

The abundance of data facilitates good decision-making, but too...

CIF Presents TWF – Dean & Sarah-Jane Gratton

In this episode of the 2024 season of our...

The Quantum Leap in Internet Technology

Envision a world where the internet, as we currently...

Translating ancient code for the generative AI era

Having easy access to data is critically important in...

Subscribe to our Newsletter