The hottest job in data science could soon be a philosopher who guides data science practitioners in the murky field of data ethics. Simply put, data ethics is a sub-field of ethics which defines a set of rules that state what is moral behaviour. Moral behaviour in data science is not simply an academic exercise which companies pay lip-service to. It directly affects your immediate bottom-line as well as the perception of your company. As Facebook have found to their cost that conducting experiments on manipulating user`s behaviour and opening up their data so that third-parties can manipulate elections has had a direct effect upon their user base, and their reputation in the market.
What's data ethics?
To miss-quote George Bernard Shaw: “If all the philosophers were laid end to end, they'd never reach a conclusion”, and this is true in the field of ethics were philosophers argue about what constitutes ethical behaviour. This interia within philosophy does not mean that there have not been attempts to define what is ethical behaviour when dealing with data. The Royal Society, for example, had a special issue of their Philosophy Journal dedicated to the Ethics of Data Science. Although the notion of data ethics may not be well spread, data scientists and researchers have however taken upon themselves to decide what are ethical data science projects. For example, Google employees refused to work on military drones, and Microsoft employees refused to work on military-related projects.
Data ethics has been formally split into three areas by Professor Luciano Floridi, of the University of Oxford. They are:
- the ethics of data (how data is generated, recorded and shared)
- the ethics of algorithms (how artificial intelligence, machine learning and robots interpret data)
- the ethics of practices (devising responsible innovation and professional codes to guide this emerging science)
Ethical data collection should be the start of any data ethical policy. There are a number of questions that should be asked before starting any data collection project. For example, obtaining consent from the audience whose data you are collecting. Breaches of this basic ethical consideration have been committed by well-known companies such as Microsoft who scraped 100,000 images of people from the Internet to use in a Facial Recognition System. Although obtaining consent from each of the people scraped would be difficult, web-site operators do use a file called robots.txt which indicate which parts of the website can be scraped. An ethical scraping policy must obey the website's robots.txt. And when using private data, the active consent of the person whose data has been captured must be obtained.
The ethics of algorithms has been in the news recently where a number of systems have been found to be biased against ethnic minorities. However it is unlikely that the algorithms are intentionally biased, but the data that it is trained on can be skewed. Skewed data can have some unfortunate side effects.Any ethical AI project must consider the bias within datasets. And to assist the data scientist there are a number of methods that can help in the reduction of bias.
The ethics of practices is an emerging area where best practices are being set up by bodies such as the European Union and the United Kingdom Government.
Your competitive edge
Data ethics seems like a subject for dusty professors in ivory towers, however, data ethics can be a competitive edge for the business. Data ethics can not only help retain skilled workers, but it can also protect your company from potential legal action, and negative publicity. The activities of Cambridge Analytica not only fatally damaged itself, but wiped $100bn off the value of Facebook. Unethical AI and ML projects will eventually damage the profitability of your company.
More titles you might be interested in:
AI-driven Competitive Edge