governance of technology and governance through technology
Class Three: Bias
For each quote, we plan on discussing it, placing it into its context, and understanding how it fits into the broader discussion about bias.
- Quotes from Bias in Sentencing
- When a full range of crimes were taken into account — including misdemeanors such as driving with an expired license — the algorithm was somewhat more accurate than a coin flip. Of those deemed likely to re-offend, 61 percent were arrested for any subsequent crimes within two years.
-
Discussion Question: there are many cases where algorithms (even the AI based ones) don’t perform that as well as humans, when should we be deploying AI algorithms and when should we not? Should we always deploy if they out-perform humans?
-
We also turned up significant racial disparities, just as Holder feared. In forecasting who would re-offend, the algorithm made mistakes with black and white defendants at roughly the same rate but in very different ways.
- The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants.
- White defendants were mislabeled as low risk more often than black defendants.”
- “Could this disparity be explained by defendants’ prior crimes or the type of crimes they were arrested for? No. We ran a statistical test that isolated the effect of race from criminal history and recidivism, as well as from defendants’ age and gender. Black defendants were still 77 percent more likely to be pegged as at higher risk of committing a future violent crime and 45 percent more likely to be predicted to commit a future crime of any kind.
-
In this particular case, basic statistical anaylsis demonstrated that the algorithm was “wrong” an equal amount of time for both defendants, however the type of “wrong” was extremely different. Should the system still be deployed? Are we ok with some errors?
- Northpointe’s core product is a set of scores derived from 137 questions that are either answered by defendants or pulled from criminal records. Race is not one of the questions. The survey asks defendants such things as: “Was one of your parents ever sent to jail or prison?” “How many of your friends/acquaintances are taking drugs illegally?” and “How often did you get in fights while at school?” The questionnaire also asks people to agree or disagree with statements such as “A hungry person has a right to steal” and “If people make me angry or lose my temper, I can be dangerous.”
-
These questions matter so much now. Who should get to decide them? How do we stop priviledge people who understand how they are designed from “gaming” them? How can they not enforce historical biases? Does the demographical inference possible from them trouble us? Should judges even have access to these questions?
- James Rivelli, a 54-year old Hollywood, Florida, man, was arrested two years ago for shoplifting seven boxes of Crest Whitestrips from a CVS drugstore. Despite a criminal record that included aggravated assault, multiple thefts and felony drug trafficking, the Northpointe algorithm classified him as being at a low risk of reoffending.
- “I am surprised it is so low,” Rivelli said when told by a reporter he had been rated a 3 out of a possible 10. “I spent five years in state prison in Massachusetts. But I guess they don’t count that here in Broward County.” In fact, criminal records from across the nation are supposed to be included in risk assessments.
- Less than a year later, he was charged with two felony counts for shoplifting about $1,000 worth of tools from Home Depot. He said his crimes were fueled by drug addiction and that he is now sober.
-
Many datasets are incomplete. Chatting with almost researcher and they will tell you how their data is always woefully incomplete. Given that data is almost never complete or robust, what standards should we have in place for what data is “complete” enough for use.
- Quotes from Bias in Hiring
- But it didn’t. After the company trained the algorithm on 10 years of its own hiring data, the algorithm reportedly became biased against female applicants. The word “women,” like in women’s sports, would cause the algorithm to specifically rank applicants lower.
-
How can we know what to audit for?
- After an audit of the algorithm, the resume screening company found that the algorithm found two factors to be most indicative of job performance: their name was Jared, and whether they played high school lacrosse. Girouard’s client did not use the tool.
-
Clearly any hiring algorithm built using ML is going to produce systemic biases or rely on systemic biases from institutions? Is this an unavoidable problem?
- Humans typically think that things done by machines are better than if they were done by a human. It’s a well-studied phenomenon called “automation bias.”
-
An algorithm is not much better than its designer. How can we account for this as they enter our lives? How can we trust security based on algorithms (Apple Face-ID issues)? Who can we trust to audit them thoroughly?
- “It blew my mind that there are 10,000 industrial organization psychologists in the world, they go to school, they get Ph.Ds in this, there’s a whole body of predictive science out there that tells us what predicts a top performer and what doesn’t, but yet 98% of the world is using poor quality, crap data that does not predict that, and only introduces a boatload of bias,” says Caitlin MacGregor, CEO of Plum.
-
Code is law. At least humans are malleable and can describe nuance. Should we just employ these people and hope that they are more aware of their biases than any algorithm could possibly be aware of its own biases.
- Quotes from Human Bias
- experienced parole judges in Israel granted freedom about 65 percent of the time to the first prisoner who appeared before them on a given day. By the end of a morning session, the chance of release had dropped almost to zero.
-
How do we solve this difficult problem of conditions well beyond the professional setting of the judges clearly impact their judgement? More broadly, how can we build scalable systems without starting to tolerate faults like this?
- This suggests that college admissions committees are more likely to accept the first applicants they consider after a lunch break. Or that quality-control officers may be more likely to ignore possible flaws in products as a long day drags toward its close.
-
This made me feel better about the various rejections I have had from colleges, internships, and job applications. That being said: (1) socially engineer any interviews you have around this, (2) This makes a pretty strong case for handing over such decisions to algorithms, that won’t get tired and cranky. While they will have bias, at least it will be statistical. This feels less arbitrary. Is that better or worse?
- Quotes from Solving AI Bias
- The first is that addressing bias as a computational problem obscures its root causes. Bias is a social problem, and seeking to solve it within the logic of automation is always going to be inadequate.
-
Interesting take. Does introducing randomness to the bias through statistics make us feel better about the decisions or worse? Similar question to above.
- Second, even apparent success in tackling bias can have perverse consequences. Take the example of a facial recognition system that works poorly on women of color because of the group’s underrepresentation both in the training data and among system designers. Alleviating this problem by seeking to “equalize” representation merely co-opts designers in perfecting vast instruments of surveillance and classification.
-
Focusing on bias ignores the systemic issues that have created the bias. Should we hold off on developing AI until we can truly understand what caused the initial biases and attempt to correct them?
- Bias is real, but it’s also a captivating diversion.
-
Goes to the above question.
- Which systems really deserve to be built? Which problems most need to be tackled? Who is best placed to build them? And who decides? We need genuine accountability mechanisms, external to companies and accessible to populations.
-
Why AI? Should we really use regression to predict food recommendations or is the social/emotional connetion with a waiter a better experience? Do we appreciate the human touch of working with a doctor or do we value their opinion more? Does increasing “efficiency” and achieving “scalability” really make humans happier?
Feedback (5 minutes)
- Did we like the readings?
- Too long? Too short?
- Did we like using Github? Github issues?