Artificial intelligence has become so ubiquitous these days that we barely realize when we’re using it. Sophisticated algorithms help Siri locate the nearest grocery store and tell us what movies to watch; they also determine what ads we see and whether or not we’re given a bank loan. Sure, AI is making our lives better—but it’s also come under fire for having a discrimination problem. And as its ubiquity increases, it’s vital that we make sure AI isn’t leaving swathes of people behind.
Still, some of the worries about AI’s discrimination problem seem to ignore the fact that the trust mechanisms currently in place are already highly discriminatory. Employment credit checks prevent qualified candidates from getting jobs, background checks rely on incomplete and inaccurate information, and the use of credit scores perpetuates racial discrimination.
AI could actually be the solution to discrimination in these systems, by offering the potential to identify and eliminate biases. When a human being or manual process makes a decision, we don’t have any way to rigorously examine the decision-making inputs and heuristics, particularly when those decisions rely on instinct. However, algorithms can be deconstructed, enabling us to examine both what has been learned about a dataset and the process used to learn it.
Models That Learn Bias Can Un-Learn Bias
It’s well established that human beings make decisions subjectively, due to a host of cognitive, personal, and social biases. So it shouldn’t surprise us that algorithms can learn biases from data that has been generated by humans. Learned bias can occur as the result of incomplete data or researcher bias in generating training data. This is obviously mitigated with larger and more robust data sets, as happens constantly in AI projects. An app built to detect potholes in Boston skewed data more heavily to wealthier neighborhoods (where more people owned smartphones), until it was installed on garbage trucks, which collected data from all over the city.
Machine learning can still target bias when the data can’t be improved. In their work with Word2vec, a dataset built from a large body of news content, researchers at Boston University and Microsoft Research found that their algorithm encoded gender-biased stereotypes like “father is to doctor as mother is to nurse” and “man is to engineer as woman is to homemaker.” Once researchers had identified this bias, however, they were able to write and implement an additional algorithm to “de-bias” the software.
Bias can also be corrected by adjusting the criteria used by an algorithm to optimize decision-making. ProPublica reports that the risk-assessment models currently used by many courts and parole boards could be revised to reduce the number of people unfairly categorized as high-risk, without sacrificing the ability to predict future crimes. Because sentencing systems are based on historical data, and black people have historically been arrested and convicted of more crimes, an algorithm could be designed in order to correct for bias that already exists in the system.
An Algorithm’s Work is Never Done
When AI gets something wrong, the criticism is swift and the judgement harsh. Take Google’s image recognition software, which initially mislabeled black people as gorillas. Hard to recover from a blunder like that—at least, if you’re a human.
But machine models don’t work that way. Models like the Google image recognition one grow increasingly accurate as they are fed more and more data, but they’re never actually done, and should never be viewed as such. There are always mistakes in a model, no matter how advanced it is. The goal for data and computer scientists isn’t to birth a mistake-free model, but is rather to build an increasingly accurate one by rapidly and efficiently iterating on these mistakes. This is where something called the human-in-the-loop comes into play; even giants like Google and Pinterest supplement their AI with humans who help the machine build its (artificial) intelligence. When Google engineers realized what was happening with their image recognition model, they were able to swiftly remove the bias and retrain the model to work better.
Perhaps ironically, people—in all their human cleverness—are part of what keeps models evolving. It’s a cat and mouse game: as models deal more and more with people, people become better and better at gaming the models. Take anti-spam and anti-virus models, for example, which must regularly be shored up against increasingly sophisticated human tactics.
How to Fight Algorithm Aversion and Bias
While it is important to examine algorithms for bias, it’s also essential to determine how the performance of the algorithm compares to the alternatives. The problem is, we’re not very good at evaluating this ourselves. University of Chicago researcher Berkeley Dietvorst demonstrates that people will avoid using algorithms that make errors, even in cases where statistical forecasts perform better than human forecasts. When humans make mistakes, we tend to rationalize their shortcomings and forgive their mistakes—they’re only human!—even if the bias displayed by human judgment is worse than bias displayed by an algorithm.
This aversion to algorithms comes from not being able to see how these complex mechanisms work. We can recognize why people behave a certain way on an intuitive level, but we don’t have an empathic understanding when it comes to machines. But transparent understanding of how an algorithm arrived at a certain conclusion is an important part of holding these systems accountable. Without a demonstration of the logic being employed, we may never feel completely comfortable relying on algorithms to make decisions for us. In a follow-up study, Dietvorst shows that algorithm aversion can be reduced by giving people control over an algorithm’s forecast. Participants who were given even slight amounts of visibility and control were “more satisfied with the process, more likely to believe that the algorithm was superior, and more likely to choose to use an algorithm to make subsequent forecasts.” Guru Banavar suggests designing AI systems to explain rationale through a conversational interaction, rather than a report.
In addition to examining bias in algorithms, it is critical to examine the bias that pops up when humans use the algorithm’s results to make decisions. Often, it is the superficial consumption of the results of a model that is the real problem. Take credit scores: a person with a 725 FICO score will definitely get a better loan than one with a 685 score. But what factors led to the difference between these scores? Did the 685 person just get a great job after leaving addiction treatment? Did the 725 one just get divorced, and thus is about to see a drop in financial capacity? The score, no matter how bias-free, does not suffice; the user of a model must take the time to dig into the underlying factors and context in order to make a truly unbiased decision. This requires an additional layer of bias-free decision-making rules, principles, training, and metrics for the human consumers of models, not just for the models themselves.
So let’s not wring our hands about discrimination in AI models and forget that the alternative of human decision-making in manual processes is much worse. Algorithms may give us the potential to amplify, quantify, and address societal biases, but in the end, they can only make a decision based on the options available. To continue creating unbiased algorithms, we must stay hyper-aware of our own human biases, too.
With contribution from Karen Shimmin of the Hippo Thinks research network.
Image Credit: CC by A Health Blog