Machine learning has been an incredible breakthrough in the past quarter-century. From diagnostic decisions in hospitals to loan approvals in banking, the conclusions made by machine learning algorithms have key implications on our personal lives, allowing us to process larger quantities of data faster than ever before.
While machine learning and associated artificial intelligence inarguably provide many benefits, unfortunately, they are also prone to error through biases, just like humans. Once we understand how algorithms are designed, we realise that these algorithms that we place so much trust in will sometimes get things wrong.
The Design of Machine Learning Algorithms
The level of complexity that machines can model is quite incredible, giving human beings the processing power to sort and analyse data faster than we ever could have imagined just 20 years ago. Yet while they’ve come a long way, we must understand that machine learning algorithms will always need a designer - someone to set the logic that the code will follow.
“Computers are good at following instructions, but not at reading your mind.” Donald Knuth (1984)
From idea to conception, human beings are the initial designers who set the rules and parameters. It’s not hard then, to understand how biases can creep into algorithms in the first place; if humans are prone to biases, and humans are the ones creating the rules, then biases can and will creep in. Whether through the parameters in the code itself or through the datasets that the systems are fed, machines are rarely free from bias. How critical the impact depends on how severe the bias.
At this stage it is completely unfair to solely blame the developers - often pressures come from senior management at a company to produce a model quickly and efficiently. Or perhaps biases slip unknowingly into the datasets that are used in training and testing.
Yet it is critical that data scientists and managers are aware of the limitations of machine learning models, and that all parties realise that algorithms are only as good as the person who develops them and the data that enters it. Whether senior executives or developers at some point, the initial designers must be held responsible.
When no-one is left to blame, we are left with systems that are unfair. Unfair to some people and the everyday people who are unable to change the models, yet feel the impact of the model’s decisions.
Biases in Algorithms
As Cathy O’Neil points out in her book Weapons of Math Destruction, machine learning models are an “abstract representation of some process”, which by their very nature means they are simplifications of the situations they are trying to predict.
Machine learning largely operates through example-based learning in order to try and predict future outcomes. Yet in very complex processes it is virtually impossible to see every possibility, meaning simplifications and shortcuts must be made. Particularly if faced with cost-saving or efficiency pressures, biases can start to creep in where these interpretations are made. Whether by accident or not, if the model reflects the mental shortcuts of its designer, then in terms of bias it may not be better than the humans who created it.
Once the rules are set, then the machine learning model will follow. In many instances, only a select few are able to understand let alone change the rules so these biases will often go undetected and unaltered. The rise in open-source development is perhaps a step in the right direction in this regard - a collaborative effort where developers can improve the source code and share ideas and suggestions. Further, open-source development not only allows people to collaborate more easily but also puts pressure on developers to look at each others code. Following this same logic, the processes of development teams have progressed, introducing processes to ensure that work is reviewed by a peer, assuring better quality and adherence to codes of ethics.
One major source of bias in machine learning is stability bias, the idea that what has been made already or what has happened before is likely to be the same in the future. Resource pressures further perpetuate this type of anchoring prejudice, preventing developers from obtaining ideal datasets that may drive more accurate predictions.
Cathy O’Neil points out that an essential component of a good model is constant feedback, to adjust the model, identify errors and to ensure results are accurate and unbiased.
Biases in Data
As well as in the logic of the code, shortfalls in the initial data itself can feed biases into a system. Typically, machine learning algorithms are only given a subset of data, which is not representative of the entire population, which may result in confirmation bias. This is where a small selection of examples help to justify the assumptions that the model was built on in the first place.
In such a process, the more data that is being fed into the model the better. It allows models to generalise better and assume more robust patterns. It is also important that the new information is regularly input to the model, otherwise a model made yesterday will very quickly become stale if not updated - so large, datasets are important to ensure fairness across the model’s lifetime.
Correlations often present biases in models, too. For example, real estate information may be used to assess a person’s creditworthiness - those in a more desirable neighbourhood are more likely to be a responsible candidate to lend credit. However, it should not be the case that if you don’t live in a wealthier neighbourhood, your credit score suffers. This is an unfair proxy which hurts many individuals in their ability to get access to credit products.
Correlations like these place people in predetermined ‘buckets’ rather than treating them as an individual. While it is possible that many individuals in an area will be of similar creditworthiness, this does not mean that every person is the same - actual past behaviours of the individual should be taken into account, not what similar people in similar situations have done. Moreover, this creates an issue of never-ending negative feedback. Once an individual has been categorised in a certain way e.g. given a negative credit score based on where they live, then it is really hard to get out of this situation. Limited data can create dangerous self-perpetuating feedback loops.
Finally, loss aversion also plays a role in many models. Growing revenue can often be confused with a successful model, where a rise in profits is understood as the model being on the right track. The money ends up serving as a stand-in for the truth, with few wishing to change the financially successful model.
Responsibility of the Designer
There are many types of biases in models, it is almost impossible to create a model with absolutely no bias whatsoever. Yet, these biases are not necessarily detrimental to huge swathes of people. Many successful models are used every day in fields such as finance, medicine and logistics, improving and often saving lives. It is quite amazing the achievements that have been made thanks to machine learning and artificial intelligence.
But we must be careful with the processes that we create. Already many models are unfair to some people of people due to small but impactful biases in the algorithms and it is our responsibility to correct these.
The removal of bias in machine learning starts with awareness. It is the responsibility of the designers and managers to know the limitations of the model they have created and to make sure that users know too.
The second most important way is to attempt to remove bias from the data itself. While biases are introduced while developing the code, one must not forget the training data and the impacts it may have on how the model is shaped.
While some simplifications are needed to create a model which is fit for purpose, we must also take care on the type and quantity of data that the model processes, the initial rules and algorithms that the machines follow and the frequency and availability of updates and improvements. Together, these three areas, data quality, design rules and transparency can help to reduce bias in machine learning.