ProPublica came out with an excellent piece this week by Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kircher looking at the algorithms used by the US justice system to predict the recidivism rate of people at different stages within the justice system. These predictive scores, called risk assessment scores, are then used to determine things like sentencing and bail amounts, with those who have a high probability of committing future crimes getting higher sentences than those with lower scores. Low and behold, these scores are also correlated with race, so if a black man and white man commit the same crime, the black man is more likely to have a higher risk assessment score which could result in a tougher prison sentence or a higher bail.
As someone who builds predictive models for a living, I am dismayed at the misuse of predictive algorithms for the purposes of discrimination, though I don’t think it’s the algorithm itself that is the problem, or at least not for the reasons that others seem to think.
Why do some predictive models give racist results?
In 2016 in the United States we are living under a white supremacist society where black men and women face institutionalized racism that leads to less opportunities for jobs and credit and greater risk of arrest and violence. Because of institutionalized racism, there are many undesirable things that correlate with being black, such as lower income, likelihood of participating in certain types of crimes, increased likelihood of being arrested for said crimes, and increased recidivism rates for those who have served their term and are released from jail. This is a classic case of causation does not equal correlation. Being black is correlated with these things not because there is something innate about black people that causes them to commit crimes but because we live in a society where external hardships are imposed on black people because they are black and those hardships themselves are correlated with crime.
Since being black is correlated with higher recidivism, if we build a predictive model it will likely result in higher risk assessment scores for people who are black. It’s not so much that the model is being racist but that it is reflecting the reality of racism. In fact, if black people have higher rates of recidivism and the model scores didn’t reflect that, then the model wouldn’t be considered very accurate.
The problem with the actual risk assessment model that ProPublica looked into, created by the company Northpointe, isn’t just that it gives higher scores to people who are black. In fact, the model overestimates the recidivism rate of black individuals and underestimates it for white individuals. The model results don’t match reality in ways that really are racist against black people.
The fact that the model is overestimating recidivism of black people means that if you have a group of black people and a group of white people with the exact same external characteristics (or at least the exact same characteristics that are inputted into the model), the individuals who are black are less likely to commit future crimes than otherwise equal individuals who happen to be white. Ironically, the way to make the model more accurate and less racist would be to include race as a factor in the model. This would control for race by lowering the scores of black people slightly to reflect their lower chance of committing future crimes and the accuracy of scores for both black and white people would go up.
Saying all of that, we can’t really know what “all else being equal” means because the algorithm itself is secret.
We need to be extra careful when working with proprietary algorithms because they are surrounded by secrecy.
“Proprietary algorithm” is a term used by companies who sell their algorithm results while keeping the algorithm itself secret. They are often sold with big promises to companies and governments that this algorithm is the best around and will predict all the things, natural variation and complexity be damned. In a way, their secretness is part of their allure, as if a company came across some piece of ancient magic that only they can wield and they will wield it for you if you pay them a hefty price.
The truth is that most proprietary algorithms are just simple regressions based on some simple data that could certainly be done easily by any statistician or data scientist for likely less than what is being paid for access to these algorithms. The very fact that they are secret is likely to mean that they are doing the minimum needed to get some sort of algorithm that maybe sort of works. Certainly not all proprietary algorithms are bullshit, but the fact that you can never really know what is in them and are difficult to test on your own means that companies could get away with selling a pseudo-random number generator in the place of an actual algorithm and most clients would never know the difference. Certainly most companies selling proprietary algorithms are probably not putting forward actual random numbers and their models might be slightly predictive or perhaps even very good, but it’s impossible for anyone outside of the company to really know and that’s the problem. There’s not any good way to tell the good proprietary algorithms from the bad unless the veil of secrecy is lifted.
In the case of these risk assessment scores, they are created using a proprietary algorithm by the company Northpointe. They then sell this algorithm to prisons and courts all over the United States, even though there is little to no openness about what is in the model, how the scores are calculated, or how accurate it really is. It’s frankly criminal that a proprietary algorithm surrounded by secrecy and created by a for-profit company is being used to determine things like sentence length. We have no idea what is in the algorithm or how predictive it really is. We are putting all our trust in something that we don’t know anything about.
You have to be careful with predictive models because they can feed on themselves and exacerbate differences between different types of people.
Even if there was complete openness and the model was well-made and pretty good at predicting future crime, it still wouldn’t be right to use it to determine things like sentencing or setting bail amounts. We know that when people cannot afford their bail they end up having to spend some time in jail, which leads to future hardships like losing their job or even losing custody of their children. If we use the risk assessment model to help decide on the bail amount and one of the predictive elements in the model gives higher scores to people who live in certain zip codes, then the people who live in the high-scoring zip codes will be less likely to post bail and more likely to spend some time in jail. That time in jail and the hardships it presents can in turn lead them to be more likely to commit a future crime. If we then refresh the model by feeding more recent data into it, the model will see that people in that zip code have an even higher recidivism rate and will adjust their scores upward at an even higher rate than before, leading to more people from those zip codes not being able to post bail and even higher future crime rates for those people.
These score increases won’t be due to anything about individuals but because those with high scores are treated in such a way that increases their future likelihood of committing crimes, each year when we refresh the model those differences with get exacerbated. In other words, high scores themselves when used in this manner lead directly to higher bail amounts which leads to higher recidivism which in turn leads to higher scores. Any biases that exist in the original algorithm will widen over time as the algorithm feeds on itself.
Algorithms aren’t magic balls and we should never use them to punish.
Lastly, I just want to add that even if this algorithm was open and accurate and we could somehow find a way around the issue of the model feeding on itself, it would still not be right under any circumstances to use in determining things like prison sentences or bail amounts. Algorithms like this give a likelihood of a future action, but a likelihood is never 100%. Even if someone scores a 90% on our open and accurate version of the model, it would still mean that one in ten people with that score will not commit a future crime. We should not be punishing people for a probability or for a hypothetical future crime.
That said, there are legitimate ways an algorithm like this one could be used. For example, inmates who fall into high-risk recidivism groups may be put in special programs that have been shown to reduce recidivism rates. Instead of punishing them with longer sentences, they would be given some extra attention along the way in order to help them cope better when later released from prison. In fact, according to the ProPublica piece, even the creator of the Northpointe risk assessment algorithm says that the original purpose of the algorithm was to be used to determine who might be good candidates for treatment programs not to determine things like sentencing, although that hasn’t stopped him and his company from knowingly marketing the algorithm to the justice department for those uses.
In other words, the problem with this algorithm isn’t so much that its underlying equation is racist but that using secretive algorithms to predict future crimes and then punishing people for those hypothetical crimes does not accomplish anything other than add to the institutionalized racism that already exists within the criminal justice system.