Companies are increasingly deploying artificial intelligence (AI) as they race to extract value from the personal data they hold. As with any project using personal data, AI is subject to data protection requirements arising from the GDPR in Europe and similar laws elsewhere. However, some types of AI, particularly systems based on machine learning, pose specific data protection risks and challenges. The Information Commissioner’s Office (ICO) recently issued new guidance on AI and data protection, which aims to support companies as they embark on AI projects. It helps companies to identify and assess risks, and to design and implement measures to mitigate those risks. This article offers practical suggestions for companies as they implement the new ICO guidance and build trusted, compliant AI.
More AI, more data and more AI regulation
More companies are using AI to process personal data, in some cases without a comprehensive risk assessment. AI often uses large volumes of personal data. Large scale, complex systems carry specific risks. This has prompted regulators to issue specific guidance on AI.
McKinsey’s 2019 survey of over 2,300 companies showed that 80% had trialled or adopted AI. Of those, 63% reported revenue increases and 44% realised cost savings, but 41% didn’t comprehensively identify and prioritise AI risks. Worryingly, only 30% were working to mitigate privacy risk. McKinsey defined AI in terms of machine learning, excluding rules-based systems or those based on decision trees. Machine learning often grabs the headlines, but other data-driven decision making can have significant impacts on individuals. For example, Ofqual’s system for assigning grades to students whose exams were cancelled because of COVID-19 was a statistical model based on attainment data, not a machine learning system.
Broadly speaking, machine learning systems work by identifying patterns in ‘training’ data, then applying those patterns to make inferences about individuals. For instance, a bank’s AI could analyse historical data on loan repayments to identify factors correlated with defaults. The system could then make a prediction about the likelihood of a new loan applicant defaulting.
The number of factors an AI model can take into account is staggering. GPT-3, an AI system for natural language processing, considers 175 billion parameters to suggest the most likely word to follow the one you just typed. In a personal data context, Acxiom, a data broker, claims to offer 5000 parameters on 700 million individual consumers. Companies can combine data from a broker with internal data (e.g. purchase history) to build even richer individual profiles.
The GDPR focuses on protecting individuals and is technology neutral. It doesn’t distinguish between machine learning, rules-based or other types of processing. Instead, it focuses on the outcomes: profiling, automated decision making, and the risk of harm. The ICO guidance joins a growing list of regulatory interventions specifically on AI. The EU first investigated AI in 2014, and has published a number of documents covering safety and liability, setting the future regulatory direction and advising companies on how to build trustworthy AI systems. In the United States, the National Institute for Standards and Technology (NIST) recently launched a consultation on their draft AI explainability guidance. Specific regulatory guidance on AI is a clear trend; companies should take note.
What are the risks?
Effective risk management helps to build trust and confidence in your AI system, and helps you to comply with the GDPR. The ICO notes that “in the vast majority of cases, the use of AI will involve processing likely to result in a high risk to individuals’ rights and freedoms”. The specific risks involved will depend on the context, including: how the system is designed and built, how and when it is deployed and the impact that using AI has on individuals.
The scale and complexity of AI systems can make it difficult to demonstrate compliance. The ICO guidance provides a useful framework to structure your approach to assessing risk. It highlights three focus areas: the ‘lawfulness, fairness and transparency’ principle, data minimisation and individual rights.
Many AI systems process large volumes of personal data. Processing data on a large scale can cause issues for data minimisation, raise surveillance concerns and might trigger the requirement for a data protection impact assessment (DPIA). Scale also carries a legal risk. Class action lawsuits, such as Lloyd v Google – a legal challenge to Google’s data collection on behalf of four million iPhone users – could result in a significant award for damages. Similarly, the Privacy Collective say that their class action suit against Oracle and Salesforce could cost those companies up to €10 billion. If these cases succeed, they would put the regulators’ power to impose fines of up to 4% of turnover or €20 million into perspective. Companies may find themselves at greater risk from legal claims for damages than from regulatory fines.
Lawfulness, fairness and transparency
Lawfulness requires that you have a legal basis for processing personal data. In an AI context, that may mean relying on different legal bases at different stages of the system’s life cycle. For instance, you may be able to rely on legitimate interest to use personal data you already hold, with appropriate safeguards in place, to develop an AI model. Deploying the model into a specific context will often require another legal basis. Bear in mind that the requirements for processing special category data (e.g. in a facial recognition system) or automating decision making are particularly stringent.
Fairness requires that your AI system is sufficiently statistically accurate, avoids discrimination and that you consider individuals’ reasonable expectations. The GDPR requires you to ensure that personal data is accurate and up to date. It empowers individuals to challenge the accuracy of data about them through the right to rectification.
The ICO distinguishes between accuracy in the data protection sense and statistical accuracy, which refers to how closely an AI system’s outputs match ‘correct’ answers as defined in the test data. Making decisions about individuals, such as whether to approve a loan application, on the basis of statistically inaccurate information is problematic. It can undermine trust in the system and may breach the fairness principle in data protection law. It can also be bad for business, as inaccurate decisions could mean missed opportunities for revenue or higher costs.
The ICO describes transparency as “being open and honest about who you are, and how and why you use personal data”. In an AI context, this means understanding when and how you use AI decisions, how the AI system itself generates its output and being able to explain both of these to the individuals affected. You should also provide information on the purposes for which you use personal data and your legal basis for doing so in your privacy notice. A machine learning system that processes thousands of parameters relating to millions of individuals to identify patterns may be difficult to explain clearly. How does each parameter contribute to the model?
Transparency also underpins fairness. Understanding how a system works means that you can check for bias. In some contexts, the risk of bias or discrimination triggers further regulatory requirements. For example, the UK’s Equality Act bans discrimination on the basis of the nine protected characteristics, and the New York State Department of Financial Services investigated the Apple Card after allegations that it discriminated against female applicants.
Big Data often assumes that more data means better AI, implying a tension between AI and the data minimisation principle. The ICO is clear that using data on the basis that it might be necessary breaches the data minimisation principle even if the data, in retrospect, turns out to be useful for the AI project. I’ve argued that effective data minimisation can improve AI.
The ICO recognises that using personal data for AI may make it harder for companies to fulfil individual rights to information, access, rectification, erasure, restriction of processing and notification. Some of these are connected to the principles above; transparency can help you to provide information to the data subject.
In some cases, AI will involve profiling and automated decision making. The GDPR specifically requires companies to provide “meaningful information” about how the AI system generated its output and what that means for the data subject. The ICO’s AI explanability guidance provides much more detail on what explaining AI means in practice.
How can companies respond?
Companies can respond to these challenges by implementing robust data governance. A data governance process helps you to understand your data, think broadly about risks and put the necessary checks and balances in place to manage them.
Training data is crucial to AI systems. Understanding the data on which a model is based can improve transparency and explainability, and help to identify and reduce bias. Most companies curate a training dataset, usually a subset of data they already hold, as a basis for their AI project. Data governance models should include a process for deciding which data is necessary, checking for risks and unintended consequences and deciding how the data should be treated once the model has been developed (e.g. should it be retained for explainability purposes?).
The training data may be de-identified, to mitigate privacy risk to individual data subjects, or may be weighted to ensure that it is sufficiently representative. The ICO’s AI explainability guidance includes a section on collecting and preparing your training data in an explanation-aware manner. This includes documenting decisions so that you can audit them later.
This could become more important as other AI regulations emerge. The European Commission’s AI White Paper proposes that companies should be required to document the training dataset (i.e. its characteristics, what values were selected for inclusion, etc.) and in some cases retain a copy of the training data itself to allow issues with the model’s performance to be traced and understood. The White Paper is a consultation document, so it’s too early to say whether these specific recommendations will become law. However, it’s clear that the trend is towards a greater focus on training data as a key element of building compliant AI systems.