ProPublica came out with an excellent piece this week by Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kircher looking at the algorithms used by the US justice system to predict the recidivism rate of people at different stages within the justice system. These predictive scores, called risk assessment scores, are then used to determine things like sentencing and bail amounts, with those who have a high probability of committing future crimes getting higher sentences than those with lower scores. Low and behold, these scores are also correlated with race, so if a black man and white man commit the same crime, the black man is more likely to have a higher risk assessment score which could result in a tougher prison sentence or a higher bail.

As someone who builds predictive models for a living, I am dismayed at the misuse of predictive algorithms for the purposes of discrimination, though I don’t think it’s the algorithm itself that is the problem, or at least not for the reasons that others seem to think.

Why do some predictive models give racist results?

In 2016 in the United States we are living under a white supremacist society where black men and women face institutionalized racism that leads to less opportunities for jobs and credit and greater risk of arrest and violence. Because of institutionalized racism, there are many undesirable things that correlate with being black, such as lower income, likelihood of participating in certain types of crimes, increased likelihood of being arrested for said crimes, and increased recidivism rates for those who have served their term and are released from jail. This is a classic case of causation does not equal correlation. Being black is correlated with these things not because there is something innate about black people that causes them to commit crimes but because we live in a society where external hardships are imposed on black people because they are black and those hardships themselves are correlated with crime.

Since being black is correlated with higher recidivism, if we build a predictive model it will likely result in higher risk assessment scores for people who are black. It’s not so much that the model is being racist but that it is reflecting the reality of racism. In fact, if black people have higher rates of recidivism and the model scores didn’t reflect that, then the model wouldn’t be considered very accurate.

The problem with the actual risk assessment model that ProPublica looked into, created by the company Northpointe, isn’t just that it gives higher scores to people who are black. In fact, the model overestimates the recidivism rate of black individuals and underestimates it for white individuals. The model results don’t match reality in ways that really are racist against black people.

The fact that the model is overestimating recidivism of black people means that if you have a group of black people and a group of white people with the exact same external characteristics (or at least the exact same characteristics that are inputted into the model), the individuals who are black are less likely to commit future crimes than otherwise equal individuals who happen to be white. Ironically, the way to make the model more accurate and less racist would be to include race as a factor in the model. This would control for race by lowering the scores of black people slightly to reflect their lower chance of committing future crimes and the accuracy of scores for both black and white people would go up.

Saying all of that, we can’t really know what “all else being equal” means because the algorithm itself is secret.

We need to be extra careful when working with proprietary algorithms because they are surrounded by secrecy.

“Proprietary algorithm” is a term used by companies who sell their algorithm results while keeping the algorithm itself secret. They are often sold with big promises to companies and governments that this algorithm is the best around and will predict all the things, natural variation and complexity be damned. In a way, their secretness is part of their allure, as if a company came across some piece of ancient magic that only they can wield and they will wield it for you if you pay them a hefty price.

The truth is that most proprietary algorithms are just simple regressions based on some simple data that could certainly be done easily by any statistician or data scientist for likely less than what is being paid for access to these algorithms. The very fact that they are secret is likely to mean that they are doing the minimum needed to get some sort of algorithm that maybe sort of works. Certainly not all proprietary algorithms are bullshit, but the fact that you can never really know what is in them and are difficult to test on your own means that companies could get away with selling a pseudo-random number generator in the place of an actual algorithm and most clients would never know the difference. Certainly most companies selling proprietary algorithms are probably not putting forward actual random numbers and their models might be slightly predictive or perhaps even very good, but it’s impossible for anyone outside of the company to really know and that’s the problem. There’s not any good way to tell the good proprietary algorithms from the bad unless the veil of secrecy is lifted.

In the case of these risk assessment scores, they are created using a proprietary algorithm by the company Northpointe. They then sell this algorithm to prisons and courts all over the United States, even though there is little to no openness about what is in the model, how the scores are calculated, or how accurate it really is. It’s frankly criminal that a proprietary algorithm surrounded by secrecy and created by a for-profit company is being used to determine things like sentence length. We have no idea what is in the algorithm or how predictive it really is. We are putting all our trust in something that we don’t know anything about.

You have to be careful with predictive models because they can feed on themselves and exacerbate differences between different types of people.

Even if there was complete openness and the model was well-made and pretty good at predicting future crime, it still wouldn’t be right to use it to determine things like sentencing or setting bail amounts. We know that when people cannot afford their bail they end up having to spend some time in jail, which leads to future hardships like losing their job or even losing custody of their children. If we use the risk assessment model to help decide on the bail amount and one of the predictive elements in the model gives higher scores to people who live in certain zip codes, then the people who live in the high-scoring zip codes will be less likely to post bail and more likely to spend some time in jail. That time in jail and the hardships it presents can in turn lead them to be more likely to commit a future crime. If we then refresh the model by feeding more recent data into it, the model will see that people in that zip code have an even higher recidivism rate and will adjust their scores upward at an even higher rate than before, leading to more people from those zip codes not being able to post bail and even higher future crime rates for those people.

These score increases won’t be due to anything about individuals but because those with high scores are treated in such a way that increases their future likelihood of committing crimes, each year when we refresh the model those differences with get exacerbated. In other words, high scores themselves when used in this manner lead directly to higher bail amounts which leads to higher recidivism which in turn leads to higher scores. Any biases that exist in the original algorithm will widen over time as the algorithm feeds on itself. 

Algorithms aren’t magic balls and we should never use them to punish.

Lastly, I just want to add that even if this algorithm was open and accurate and we could somehow find a way around the issue of the model feeding on itself, it would still not be right under any circumstances to use in determining things like prison sentences or bail amounts. Algorithms like this give a likelihood of a future action, but a likelihood is never 100%. Even if someone scores a 90% on our open and accurate version of the model, it would still mean that one in ten people with that score will not commit a future crime. We should not be punishing people for a probability or for a hypothetical future crime.

That said, there are legitimate ways an algorithm like this one could be used. For example, inmates who fall into high-risk recidivism groups may be put in special programs that have been shown to reduce recidivism rates. Instead of punishing them with longer sentences, they would be given some extra attention along the way in order to help them cope better when later released from prison. In fact, according to the ProPublica piece, even the creator of the Northpointe risk assessment algorithm says that the original purpose of the algorithm was to be used to determine who might be good candidates for treatment programs not to determine things like sentencing, although that hasn’t stopped him and his company from knowingly marketing the algorithm to the justice department for those uses.

In other words, the problem with this algorithm isn’t so much that its underlying equation is racist but that using secretive algorithms to predict future crimes and then punishing people for those hypothetical crimes does not accomplish anything other than add to the institutionalized racism that already exists within the criminal justice system.

Jamie Bernstein

Jamie Bernstein

Jamie is a data, stats, policy and economics nerd who sometimes pretends she is a photographer. You can usually find her at skeptic events in Chicago or on Twitter or Flickr. She also blogs about music at Notes From Chicago Music Underground.

Previous post

Quickies: Hillary's "Woman Advantage," How The Media Avoids Saying Penis, and Banana Flavoring

Next post

Queereka looking for writers!

4 Comments

  1. May 25, 2016 at 1:06 pm —

    Thank you for writing the awesome article I wanted to write but didn’t have the time or talent to. As far as algorithms that predict an individual’s behavior go, this one is actually “pretty accurate.” Something like this for a marketing campaign or similar would be pretty powerful.

    But pretty accurate is NOT acceptable when it comes to making individual decisions that seriously impact human lives.

    All of the yes.

    • May 25, 2016 at 4:42 pm —

      Unfortunately, algorithms like this are already in play for large numbers of real world life-affecting things.

      Your ability to buy a home is contingent on your credit rating, which is in turn determined by learning systems that are fed arbitrary biographical information much like this one is.

      • May 25, 2016 at 5:25 pm —

        I don’t think credit ratings are perfect, far from it. But it is a much more transparent system than this one we are discussing. We know what goes into a credit rating, and how to manage it, we all have access to our own credit rating and reports. People are aware that it has limitations, we have access to the information agencies use when creating it and can attempt to correct inaccurate data. Plus credit ratings change over time with your behavior. So if you are able to improve your financial standing, your credit rating will improve over time. It allows for the idea that people can change.

        Plus credit ratings do not use “behavioral” evaluations or questions, which have their own set of issues.

        I guess I view consumer decisions like buying a home or a car as less life transforming than spending more time or no time in jail or prison. There is certainly an argument to be made that that is not the case.

  2. May 26, 2016 at 6:31 pm —

    I had a sort of interesting thought on this subject.. We in the US seem to treat laws, and more specifically law enforcement, like the “solution” to the problems. But, this is backwards. Cops are sanitation workers. They exist to clean up the mess that was made further up the food chain. Fundamentally they should be like the repair crews at Mammoth Sky resort, where my dad once worked, and back when the original owner/creator ran it, and they had their own staff. When the boss came in and saw everyone sitting around, drinking coffee, he reacted by saying, “This is what I like to see. If you are doing nothing, it means everything else is working, everywhere else.”

    A cop that spends 99.9% of their time chasing down bad guys is like a sewer worker, who has to dig up the pipes in the road every 2-3 days, to fix the sewer pipes – because **everyone upstream is flushing things down the pot that plug up the pipes**. Only.. you find out that the reason they are doing this is because some bloody genius is either a) convincing people that they should flush garbage, instead of putting it out for the trash man, or, worse, b) another genius has decided that “black neighborhoods don’t need trash services”. No sensible person would put up with having the street torn up every few days, because 10,000 people are flushing the remains of their TV dinners down the toilet, due to some joker having decided to cut funding needed to run trash trucks in half the freaking city. But.. 10,000 people stealing, selling drugs, running gangs, robbing, killing, etc. each other, because the same joker defunded the school system, among other stupidities? Gosh, the answer must be to do the equivalent of passing a three strikes law for flushing TV dinners, and hiring more sewer workers!!

    Like I said… we seem to have entirely the **wrong** view of the profession. They are not repair men, who go out and fix shit. They are sanitation workers, in most cases – merely cleaning up the mess **after** the fact. And, you can’t “fix” the problems by hiring more people to merely clean them up, after they already happened.

Leave a reply