Why Language & Data Matters. The Fight Against Racism in AI

The recent protests in the US and around the world have highlighted racist representations in the mass media, the internet and language in general. AI will reproduce these biases unless we tackle this. Explainability, and working with social scientists are both key to winning this fight.  

It has also long been clear from research literature that representations of race, both online and offline, frequently contain common racist tropes and myths. This matters for AI which will reproduce these biases.

Natural Language Processing systems, for example, learn and extract meaning from the unstructured data of human language, as it is actually spoken. If this data is itself biased then the inferences drawn by the AI will of course also be tainted.

GLoVe is an algorithm that can show unseen relationships between words and how they are utilised in a collection of written texts, or corpus.  Robyn Speer showed that names associated with the African American Community scored negatively while those associated with the Anglo-Saxon community scored positively. This was because inferred relationships in the corpus correlated black names with negative events. In another, infamous example, Google classified photos of African-Americans as gorillas. Although this was likely to due to shortcomings in their training data, rather than a deliberately racist AI system, the impact is clear to see. Their ‘fix’ of removing the Gorilla label was also a little distasteful.

When systems make decisions that affect our lives, bias in data is a serious problem.

In the US judges are now relying upon algorithms to determine judicial sentences.  And their decisions are racist. Companies often use AI to assist with hiring decisions, and they are sexist. These systems are life changing. And because they are proprietary they are not open to scrutiny.

This serious problem of bias in data has attracted attention from data advocates as well as researchers. And the rise of explainable AI may be able to mitigate bias by providing explanations for inferences which can be audited for decisions based upon protected characteristics. However openness must be embraced by system designers to allow for auditing of systems and their labelled data. The ultimate answer is that computer scientists must pair up with social scientists to design fair systems otherwise the ML revolution is just encoding human bias into a model.