Most of us will have seen good news stories in the press about AI being useful in cancer diagnosis, with intelligent systems giving a 'yes' or 'no' response on a potential diagnosis based on pattern recognition. The AI is trained using pictures of potential melanomas so that the system learns what breast cancer looks like. According to a study in early 2020, these kind of algorithms appear to be outperforming the human experts—so far, so good! However, can we really trust this system? To do that, we need to be sure that it is fair and free from unwanted bias.
So where could bias come from when using a sample dataset to train an AI? It could be selection bias, where the sample used to train an AI is 'bad' in that it doesn't reflect the whole population. This could be because the sample size was too small, or was itself taken from a subset of the whole. When the data relates to people, it might be that it is 'good' in the sense that it reflects society, but 'bad' because society itself is biased. We might get survivorship or confirmation bias, if those selecting the training data exclude certain examples because they don't fit particular parameters (including conscious or unconscious preconceptions). Or measurement bias, where the instruments used in collecting data influence it. Outliers can also skew the data—for example, if the dataset is looking at average household spend, should we include the Royal family?
When we think about bias we often think about mistakes being made in the design of the system, but that's not to say that bias couldn't be introduced maliciously. For example, if the dataset to be processed is generated based on sentiment analysis, who is to say that the sentiment expressed is genuine. We've already seen businesses posing as their own customers to leave false positive reviews, and false negative reviews of their competitors. And allegations of the use of 'bots' by foreign powers to influence public opinion abound. If an AI can't distinguish between real and 'fake news', then it is vulnerable to being deliberately misled.
Whilst breast cancer diagnosis is a real-life (and seemingly successful) use-case for AI, let's use a hypothetical breast cancer diagnosis system just to illustrate the issues. Firstly, if all the images are of human breast tissue then there is already a degree of self-selection present in the dataset.
Has the AI has been trained to flag exceptions, or does it always 'expect' to see a correctly taken mammogram? Let's assume that the absence of a firm positive is treated as a negative, and that it is unable to distinguish between a mammogram and a femoral x-ray accidentally introduced into the system. Without an exception process the system would give a negative reading for the patient without having examined the correct image.
Where did the training images come from? In 2011, around 86% of the population of England and Wales was white, of which around 51% were female. If all the training images were mammograms from white women, meaning that it fails to detect the cancer in men or in women of other ethnicities, then that would strike me as being a real problem.
Was the same equipment used when taking all of the training images? What if the system can only recognise cancers on images from the scanner used for the training images, and it fails if the image is slightly differently generated?
We must seek to eliminate bias completely where it may lead to an unfair outcome. In fact, some have suggested that the use of AI could even improve societal fairness if we can avoid unwanted bias. But we need to be mindful of how we do this. In the medical diagnosis example, certain illnesses are more prevalent in certain populations than in others—for example, because of genetic predisposition. So it isn't just a case of ignoring racial data, it's about identifying where the data may lead to unfairness.
As I alluded to earlier, achieving fairness relies on us recognising that bias doesn't only emerge when datasets don't reflect society, but also when datasets accurately reflect unfair aspects of society. Amazon took the decision to stop using a recruitment tool that used machine learning to decide who to hire. The tool sifted through applications, based on its analysis of patterns in CVs submitted to Amazon over the previous 10 years, and made hiring recommendations. Because most people applying to technology roles in that period were male, the AI erroneously concluded that male candidates were preferable to female candidates. We've also seen AI displaying racial and gender bias in predicting which criminals may become repeat offenders.
The innate subjectivity of developers can also introduce unwanted bias, and many of us will have seen the viral video of the 'racist' soap dispenser. We can help to reduce these biases by ensuring that developers of AI come from diverse gender, ethnic, and socio-economic backgrounds, and belief systems. They also need to be trained to be aware of—and stick to—ethical codes of conduct. A corollary of following this course will be to help reduce inequality in our workforce more generally.
Whilst it is incumbent on Providers to make sure they are developing AI in a way to avoid unwanted bias, it is not just their problem. Better education and training will be required to ensure that society has the expertise and capability to identify bias in its various forms and seek to eliminate it. We will also need to understand our own innate subconscious biases, if we are to remove from society those prejudices that we are asking that AI does not perpetuate.
Sign up to our email digest