Algorithms rule our lives. They increasingly intervene in our daily activities – i.e. career paths, adverts, recommendations, scoring, online searches, flight prices – as improvements are made in data science and statistical learning.
Despite being initially considered as neutral, they are now blamed for biasing results and discriminating against people, voluntarily or not, according to their gender, ethnicity or sexual orientation. In the United States, studies have shown that African American people were more penalised in court decisions (Angwin et al., 2016). They are also discriminated against more often on online flat rental platforms (Edelman, Luca and Svirsky, 2017). Finally, online targeted and automated ads promoting job opportunities in the Science, Technology, Engineering and Mathematics (STEM) fields seem to be more frequently shown to men than to women (Lambrecht and Tucker, 2017).
Algorithmic bias raises significant issues in terms of ethics and fairness. Why are algorithms biased? Is bias unpreventable? If so, how can it be limited?
Three sources of bias have been identified, in relation to cognitive, statistical and economic aspects. First, algorithm results vary according to the way programmers, i.e. humans, coded them, and studies in behavioural economics have shown there are cognitive biases in decision-making.
- For instance, a bandwagon bias may lead a programmer to follow popular models without checking whether these are accurate.
- Anticipation and confirmation biases may lead a programmer to favour their own beliefs, even though available data challenges such beliefs.
- Illusory correlation may lead someone to perceive a relationship between two independent variables.
- A framing bias occurs when a person draws different conclusions from a same dataset based on the way the information is presented.
Second, bias can be statistical. The phrase ‘Garbage in, garbage out’ refers to the fact that even the most sophisticated machine will produce incorrect and potentially biased results if the input data provided is inaccurate. After all, it is pretty easy to believe in a score produced by a complex proprietary algorithm and seemingly based on multiple sources. Yet, if the data set based on which the algorithm is trained to learn to categorise or predict is partial or inaccurate, as is often the case with fake news, trolls or fake identities, results are likely to be biased. What happens if the data is incorrect? Or if the algorithm is trained using data from US citizens, who may behave much differently from European citizens? Or even, if certain essential variables are omitted? For instance, how might machines encode relational skills and emotional intelligence (which are hard to get for machines as they do not feel emotions), leadership skills or teamwork in an algorithm? Omitted variables may lead an algorithm to produce a biased result for the simple reason the omitted variables may be correlated with the variables used in the model. Finally, what happens when the training data comes from truncated samples or is not representative of the population that you wish to make predictions for (sample-selection bias)? In his Nobel Memorial Prize-winning research, James Heckman showed that selection bias was related to omitted-variable bias. Credit scoring is a striking example. In order to determine which risk category a borrower belongs to, algorithms rely on data related to people who were eligible for a loan in a particular institution – they ignore files of people who were denied credit, did not need a loan or got one in another institution.
Third, algorithms may bias results for economic reasons. Think of online automated advisors specialised in selling financial services. They can favour the products of the company giving the advice, at the expense of the consumer if these financial products are more expensive than the market average. Such situation is called price discrimination. Besides, in the context of multi-sided platforms, algorithms may favour third parties who have signed agreements with the platform. In the context of e-commerce, the European Commission recently fined Google 2.4bn euros for promoting its own products at the top of search results on Google Shopping, to the detriment of competitors. Other disputes have occurred in relation to the simple delisting of apps in search results on the Apple Store or to apps being downgraded in marketplaces’ search results.
Algorithms thus come with bias, which seems unpreventable. The question now is: how can bias be identified and discrimination be limited? Algorithms and artificial intelligence will indeed only be socially accepted if all actors are capable of meeting the ethical challenges raised by the use of data and following best practice.
Researchers first need to design fairer algorithms. Yet what is fairness, and which fairness rules should be applied? There is no easy answer to these questions, as debates have opposed researchers in social science and those in philosophy for centuries. Fairness is a normative concept, many definitions of which are incompatible. For instance, compare individual fairness and group fairness. One simple criterion of individual fairness is that of equal opportunity, the principle according to which individuals with identical capacities should be treated similarly. However, this criterion is incompatible with group fairness, according to which individuals of the same group, such as women, should be treated similarly. In other words, equal opportunity for all individuals cannot exist if a fairness criterion is applied on gender. These two notions of fairness are incompatible.
A second challenge faces companies, policy makers and regulators, whose duty it is to promote ethical practices – transparency and responsibility – through an efficient regulation of the collection and use of personal data. Many issues arise. Should algorithms be transparent and therefore audited? Who should be responsible for the harm caused by discrimination? Is the General Data Protection Regulation fit for algorithmic bias? How could ethical constraints be included? Admittedly they could increase costs for society at the microeconomic level, yet they could help lower the costs of unfairness and inequality stemming from an automated society that wouldn’t comply with the fundamental principles of unbiasedness and lack of systematic discrimination.
David Bounie, Professor of Economics, Head of Economics and Social Sciences at Télécom ParisTech
Patrick Waelbroeck, Professor of Industrial Economy and Econometrics at Télécom ParisTech and co-founder of the Chair Values and Policies of Personal Information