AI Ethics, AI Policy, AI Sentiment, AI strategy, AI Survey, Algorithmic, Artificial Intelligence, Bias, Ethics, Fairness, Trust

Trust thou AI?

“The way to make machines trust-worthy is to trust them” to paraphrase Ernest Hemingway (Selected letters 1917-1961).


What is the essential prerequisites, for us consumers and professionals alike, to trust an Artificial Intelligence (AI) based product or service?

If you have followed the AI topic a bit or maybe even a lot, if you have been lucky (or not) talking to consultants about AI design, you may get the impression that if we can design a transparent explainable auditable AI all is well with AI Ethics and AI Fairness until kingdom come or an AGI (an Artificial General Intelligence that is) descends from the Clouds. We are led to believe that people, mass consumers, the not-in-the-know not-subject-matter-experts, will trust any AI-based product or service that we can “throw” at them as long as it is transparent, explainable and auditable. According with the European General Data Protection Regulation (GDPR) we have a “Right to an Explanation” of an action taken by an automated or autonomous system (see also “Article 22 – Automated individual decision-making, including profiling”). However, it should also be pointed out that the GDPR is very vague (to put it mildly) about the structure and content of such an explanation. As has also been pointed out by Wachter, Mittelstad & Floridi (2017), GDPR does in fact not oblige autonomous decision-making systems to provide an explanation for its derived decision, at most it offers information.

While GDPR, as it relates to AI-driven decision-making processes, may make the European Commission feel good, consultants a lot richer in monetary terms and researches in academic, it really doesn’t do much to enhance trust between a consumer and The Thing. Which is obviously not the intention of the regulation, but it is the subject of this essay.

In much of the current debate around trust in AI, transparency and explainability are frequently evoked. The two concepts are however awfully similarly described. Although often well crafted to appear more different than they may be given the context. The current dogma is that if the AI is transparent, actually the process that leads to an AI agents actions, it is also explainable. Thus may also be more trustworthy. Basically transparent is here used synonymously for explainable. Given we are in the realm of computer science it is good to remember that the term transparency is often used to mean that a given property of a system is hidden (by design) from the user or other main computing processes. Interestingly enough, this is definitely not what is meant with transparency of an AI process and action. To strengthen the trust bond between humans (as well as institutions) and AI we also require auditability of a given AI-based process and action. That is, we are able to trace-back from an AI action through the internal AI computations & processes and verify how that particular action came about.

I will not say it is BS to consider transparency, explainability and auditability in your AI design. Of course, it is not! … But maybe it is a bit … to believe that this is sufficiently to make consumers (or the public in general) trust an AI-based application (i.e., service, product, …). It is nice words, with fairly unclear meaning, that are (very) important for regulators and public institutions to trust corporation’s AI developments. Maybe not so much for the general publics or consumer’s trust in AI that corporations are expose them to. As I will explain in this essay, it can only be a small part of the essentials for creating a trust bond between humans and AI.

Trust between humans, at least within what we perceive as our social group (i.e., “usness”), is a trait of evolutionary roots that have allowed us to foster collaboration within larger social groups (with some ugly limitations of “usness” and “themness”). The ability to trust may even have made it possible for us humans to crawl to the top of the “food chain” and kept that pole position for quiet a while.

What about our trust in machines and non-human (non-sentient) things in general? Trust between humans and non-human agents. We are increasingly exposed to much higher degrees of system automation as well as Artificial Intelligent (AI) based applications. Machine automation and autonomy are taking many tasks over from us at home, at work and anywhere in between. This development comes with the promise of much higher productivity at work and far more convenience at home and anywhere else for that matter.

Trust in automated machines – From professionals to consumers.

If you work professionally with a complex system (e.g., an airplane, a train, energy, nuclear or chemical plants, telecommunications networks, data centers, energy distribution networks, etc…) the likelihood is fairly large that you are already exposed to a very high degree of machine and system automation. You may even be exposed increasingly to system autonomy fueled by AI-based solutions (e.g., classical machine learning models, deep learning algorithms, recurrent neural networks, re-enforcement learning rule based control functions, etc…). As a professional or expert operator of automation, you embrace such systems if you have deemed them trustworthy. That typically means; (a) the automation solution perform consistently, (b) is robust to many different situations that may occur and even some that may very rarely occur, (c) has a very high degree of reliability (e.g., higher than 70%). Further, it is important for your trust that you believe you understand the automation principles. All of this (and more) ensures to strengthen the trust bond between you and the automation. If there is a lack of trust or a break in trust between the human operator and the automation, it will lead to wasted investments, in-efficiencies and disappointing productivity growth. It may also lead to accidents and potential disasters (Sheridan & Parasuraman, 2005). If human operators lack trust in a system automation or autonomous application, you are better off relying on manual work arounds.

Clearly, it is no longer only certain type of jobs and workers that are exposed to automation and AI-based autonomy. All of us … irrespective of background … will increasingly be experiencing AI-based applications that may initiate actions without human intervention or first “asking” for human permission. The trust bond between a human and an autonomous application is essential for that application to become successful and do what it was designed to do. With successful I primarily define it as increased and sustainable utilization. Thus we need to better understand the dynamics of trust between humans and non-human intelligent entities. What can we learn and expect from human-human trust bonds and what is different in human-non-human trust bonds. We are already being exposed to highly specialized artificial intelligent agents. In complex system designs as well as simpler commercial  products, applications and services in general.

While businesses deploying algorithmic-based automation and autonomy for their products and services can learn a lot from the past research, they will have to expand on this work also to include their customers who are not subject matter experts or skilled automation operators. You-and-me focus is required. The question that I ask in this essay is how do we in general feel about trusting an artificial intelligent entity (i.e., an agent) that eventually may out-compete most of us in the work environment or at least disrupt it very substantially. An AI entity that can replicate and evolve much faster in comparison with humanity’s incredible slow evolutionary progress.

Trust … It starts with your brain.

The feeling of trust arises in your brain. It is a result of changes in your brain chemistry. Your feeling of trust is an interpretation of your emotional states triggered by physiological changes (Barret, 2017). The physiology of trust also connects to your gut and other parts of your body via the central nervous system. The resulting physiological reaction, e.g., change in heart rate, goose bumps, that weird feeling in your stomach, sense of well being, sense of unease or dread, etc., makes you either trust or want to run away. The brain chemistry will either suppress your fear or enhance your sense of unease. The more novel a trust situation will be, the more unease or fear (e.g., emotions) will you feel about making the leap of faith required to initiate the trust bonding process.

However, the more prior knowledge we have, including from other parties that we already trust, of a given trust situation, the easier does it become for us to engage trust. This process is eloquently described by Robert Sapolsky in his seminal work “Behave: The Biology of Humans at Our Best and Worst” (Sapolsky, 2017) and in the original research work by Paul Zak on enhancing trust effect of the brain molecule Oxytocin (Kosfeld, Heinrichs, Zak, Fischbacher & Fehr, 2005; Zak, 2017; Choleris, Pfaff, & Kavaliers, 2013). Our little “trust” messenger (Oxytocin) has been attributed too all groovy good things in this universe (at least for vertebras), backed up with lots of cool trust game variations (including sniffing the little bugger), and academic research in general. One of Oxytocin’s basic functionalities, apart from facilitating mother-baby bonding and milk production, is to inhibit our brain’s fear center (i.e., the amygdala) allowing for a higher degree of acceptance of uncertain situations (its a bit more complex than but this suffice for now) and thus more susceptible to certain risks. While Oxytocin certainly drives a lot of wonderful behaviors (i.e., maternal/paternal instincts, trust, love, commitment to partner, etc..) it has a darker side as well. In general oxytocin reduces aggression by inhibiting our brain’s fear center. However, when we perceive that our young children (or your pups for the prairie voles reading this blog) are in danger or being threatened, oxytocin works in the opposite direction of enhancing the fear. Resulting in an increased level of aggression. See also Sapolsky’s wonderful account of the dark side of oxytocin (“And the Dark Side of These Neuropeptides”, Kindle location 1922) in his book “Behave” (Sapolsky, 2017).


Oxytocin: to be or maybe not to be the trust hormone? A recent 2015 review by Nave et al (Nave, Camerer and McCullogh, 2015) of relevant literature attributing Oxytocin to trust concludes that current research results does not provide sufficient scientific evidence that trust is indeed associated with Oxytocin or even caused by it. In general, it have been challenging to reproduce earlier findings proving (beyond statistical doubt) the causal relationship between Oxytocin and establishing trust bonding between humans. Thus, it is up to you dear reader whether you trust the vast amount of studies in this area or not. That Oxytocin plays a role in pair-bonding as well as parents-child bonding seems pretty solid (Law, 2010; Sapolsky, 2017). Also there appears to be a correlation of increased Oxytocin levels (by sniffing the stuff or by more natural means) and increased readiness to trust (Zak, 2017; Choleris, Pfaff & Kavaliers, 2013). Interestingly (men do pay attention here!), for women with increased levels of oxytocin, typically women with young children still breastfeeding, appears to make them less forgiving when they perceive that their trust has been betrayed (Yao, Zhao, Cheng, Geng, Lou & Kendrick, 2014).

Can a puff and a sniff of Oxytocin make us trust non-human-like agents, e.g., automation SW, AI-based applications, autonomous systems (e.g., cars, drones), factory robots, avionic systems (e.g., airplanes, flight control), etc…  as we trust other humans? … The answer is no! … or at least it does not appear so. A human-human trust bonding is very particular to being human. Human-non-Human trust dynamics may be different and not “fooled” by a sniff of Oxytocin. Having frequent puffs of Oxytocin will not make you love your machine or piece of intelligent software … Unless as it also appears too be more human-like. And that might also have its limits due to the uncanny valley “sense”, i.e., our amygdala starts ringing its alarms bells ever so softly that the entity we interact with is too human-like and yet a little bit off. Enough to get the uncanny or uneasy feeling going.

The trustworthiness of automation.

It has long been established that we tend to use automation only when we find it trustworthy (see for example work of Madhavan & Wiegman, 2007; Visser, Monfort, Goodyear, Lu, O’Hara, Lee, Parasuraman & Kruger 2017; Balfe & Wilson, 2018). If we do not trust an automation it will be rejected by the human operator, just like an untrustworthy human will be left alone. When the reliability of an automation is no better than about 70%, it is in general regarded as useless by its human operators and becomes an operational and financial liability (Wickens & Dixon, 2007). It is important to note that much of the human-automation trust research have focused on professional and expert users of complex or advanced automated systems, such as pilots, air traffic controllers, train operators, robotics plant controllers, chemical & nuclear plant operators, brokers, military technology operators (e.g., drones, autonomous vehicles, … ), communications network controllers, etc…

So … what matters for establishing a trust bond between human and automation? A large body of research shows us that the most important factors for establishing a trust bond between human and an automation function is; reliability (of automation), consistency (of automation), robustness (of automation), dependability (of human operator), faith (of human operator) and understand-ability (of human operator). Much of which is fairly similar to what we require from another human being to be regarded as trustworthy.

Okay,  we have a reasonable understanding of trust bonds between humans and humans and automation enablers. What about Human and AI trust bonds? Given an AI-based complex system might have a higher degree of autonomy than a automated advanced system, it may very well be that the dynamics of trust and trustworthiness are different. At least compared to what we today believe we understand about Human-Automation trust.

For sure it is no longer only experts or professional operators that are being exposed to advanced automation and autonomous systems. For sure these systems are no longer limited to people who have been professionally trained or schooled, often over many years, before they are let loose on such advanced systems. Autonomous systems and AI-based applications are increasingly present in everyone’s everyday environment. At Home. At Work. And anywhere in between. Consumers of all genders, children, pets, octogenarians, Barbie dolls and dinosaurs and so forth … we will eventually have to interface with AI-based applications. Whether we like it or not.

The current trend among consultants (in particular) is to add new trust prerequisites to the above list (if the established ones are considered at all) Human-AI trust essentials; Explainable AIs or XAIs (i.e., can actions of an AI be understood by Humans), Transparent AIs (i.e., loosely to fully understand why certain actions are performed and others not ) and Auditable AIs (i.e., an unbiased examination and evaluation of the code and resulting actions of an AI-enabled application). While these trust prerequisites are important for experts and researchers, the question is whether they are (very) important or even relevant to the general consumer at large? … If my life insurance application was rejected, would I feel much better knowing that if I loose 40 kg, stop smoking, was 30 years younger, lived in a different neighborhood (with twice the rental fees) and happened to be white Caucasian, I would get the life insurance or could afford to pay 3 times the monthly insurance fee (obviously an AI-based outcome would be better disguised than this example).

If you have the feeling that those 3 elements, Explainability, Transparency and Auditability seems approximately 1 element … well you are no alone (but don’t tell that to the “experts”).

So … How do we feel about AI? Not just “yous” who are in the know … the experts and professionals … but you, me, and our loved ones, who will have little (real) say in their exposure to AI, automation & autonomous products and services.

You and me focus …. How do we feel about AI?

We appear to be very positive about Artificial Intelligence or AI for short. Men in general more positive than women. Men with young children much more positive than any other humans. As can be seen below, it doesn’t seem like Arnold Schwarzenegger has done much to make us have strong negative feelings towards artificial intelligence and what we believe it brings with it. Though one may argue that sentiments towards robots may be a different story.

how do you feel about AI

In the above chart the choices to the question “How do you feel about AI?” has been aggregated into Negative sentiment: “I hate it”, “It scares me” and “I am uncomfortable with it” , Neutral sentiment: “I am neutral” and Positive Sentiment: “I am comfortable with it”, “I am enthusiastic about it” and “I love it”.

On average most of us are fairly comfortable with AI. Or more accurately we feel comfortable with what we understand AI to be (and that may again depend very much on who and what we are).

One of the observations that have come out of conducting these “how do you feel about AI?” surveys (over the last two years) are that there are gender differences (a divide may be more accurate) in how we perceive AI. This needs to be an important consideration in designing AI-based products that will be meaningful appeal for both women and men (and anyone in between for that matter). Given that most AI product developers today are male, it might be good for them to keep in mind that they are not only developing products for themselves. They actually need to consider something that will be appealing to all  genders.

That chart below reflects the AI sentiment of women (808) and men (815) from a total amount of 1,623 respondents across 4 surveys conducted in 2017 and 2018. Most of those results have individually been reported in my past blogs. So … Women feels in general significantly less positive towards AI compared to men. Women overall have a slightly more negative sentiment towards AI than positive. Overall there are more women than men who rank their feelings as neutral. Men with children (younger than 18 years of age) are having the most positive feelings towards AI of all respondents. Unfortunately, the surveys that so far has been carried out does not allow for estimating the age of the youngest child or average age of the children. Women’s sentiment towards AI does not appear (within the statistics) to be dependent on whether they have children younger than 18 years of age or not or no children. Overall, I find that;

Women appear to be far less positive about AI than men.

Men with young children are significantly more positive than men and women in general.

Contrary to men, women’s sentiment towards AI does appear to depend on their maternal status.

gender divide ai.png

So why are we so positive … men clearly much more than women … about AI? This despite that AI is likely to have a substantial impact (to an extend it already have) on our society and way of living (e.g., privacy, convenience, security, jobs, social network, family life, new consumption, policies, etc..). The median age of the respondents was about 38 years old. Although respondents with children (less than 18 years of age) was about 33 years old. In the next 10 years most will be less than 50 years old and should still be in employment. In the next 20 years, most will be less than 60 years old and also still very much in active employment. Certainly, young children of the respondents would over the next 20 years enter the work place. A work place that may look very different from today due to aggressive pursuit of intelligent automation and autonomous  system introduction.

Is the reason for the positive outlook on AI that the individual (particular the male kind) simply do not believe the technology to be an existential threat to the individual’s current way of living?

If you think about your child or children, how do you believe AI’s will impact their future in terms of job and income? … while you think about this! … I will give you the result of one of the surveys (shown below) that I have conducted in September 2018.

future of child.png

In terms of believing that the future will be better than today, women are less positive than men. Across gender fewer are of the opinion that the opportunities of their children (whether they are below 18 or above) will remain the same as today. Women appear to have a more negative outlook for their children than men. There is little difference in men’s beliefs in their child’s or children’s future opportunities irrespective of the age of their children. Women having children under 18 years of age are significantly less optimistic of the outlook of their children’s opportunities compared to those women with older children.

From work by Frey & Osborne (2013) on how jobs are becoming susceptible to what they call computerization, there is plenty of room for concern about individuals job and thus income security. According with Frey and Osborn, 47% of the total US employment is at risk within a decade or two. A more recent PwC economical analysis estimates that the impact of algorithmic & AI-based automation across all industries will be in the order of 20% by late 2020s and 30% by the late 2030s (Hawksworth & Berriman, 2018). Job categories served by low and medium educated will be a hit the hardest. Women are expected likewise to be impacted more than men. Irrespective of how you slice and dice the data, many of us will over the next 2 decades have our lives, livelihood and jobs impacted by the increased usage of intelligent automation and autonomous systems.

In order to study this a bit further, I asked surveyed respondents two questions (structured in an A and a B 50:50 partition); A: “Do you believe your job could be replaced by an AI? and B: “Thinking of your friends, do you believe their jobs could be replaced by an AI?“.

you & your friends job impact by AI

From the above chart it is clear that when it comes to AI impacting job security, the individual feels much surer about their own job security than the individual’s friends or colleagues. Only one fifth, of respondents answering Yes or No to whether they believed that their jobs could be replaced by an AI, thinks that AI actually could replace their jobs. Interestingly, men assessing their own job security is almost twice as sure about that security compared to women (based on the number of Maybe answers).

From the results of the survey shown above, we assign a much higher likelihood to our friends and colleagues prospects of loosing their jobs to an AIs than that happening to ourselves. Maybe it is easier to see our friends and colleagues problems & challenges than our own. Both women and men appears more uncertain in assessing their friends job security than their own. Although a less dramatic difference in uncertainty between women and men, men still appear less uncertain that women in their assessment of their friends job security.

There are many consultants, some researchers and corporations working on solutions and frameworks for Transparent AIs, Explainable AIs, and Auditable AI as a path to create trust between a human and an AI-based agent. Many are working exclusively with the AI in focus and thus very technocentric in approach. Very few have considered the human aspect of trust, such as

  • The initial trust moment – how to get the consumer to the “leap of faith moment”, where human engage with a product or service (or another human being for that matter). This is obviously a crucial and possible scary moment. The consumer has no prior experience (maybe peers recommendation which will help) and is left to faith and will be the most dependable or vulnerable for disappointment. It is clear the peer opinion and recommendation will mitigate much uncertainty and unease.
  • Sustainable trust – how to maintain sustainable trust between a user and a product (or another human being). Here priors will be available and of course consistent performance will play a big role in maintaining and strengthening the trust bond.
  • Broken trust or untrusting – as the saying goes “it takes 10 good impressions to neutralize a bad one” (something my grandmother hammered into my head throughout childhood and adolescence … Thanks Gram!) … Once trust has been broken between a human and a product or service (or another human being) it is very difficult to repair. The stronger the trust bond was prior to untrusting the more physiological and neurological “violent” will the untrusting process be and subsequently recovery from the feeling of betrayal. As another saying goes “Heav’n has no rage like love to hatred turn’d, Nor hell a fury, like a woman scorned” (William Congreve, 1697). And “no Oxytocin in this world will make a women betrayed not want her pound of flesh” (Kim Larsen, 2018).
  • The utility of trustnot all trust bonds are equally important or equally valuable or equally costly, some may even be fairly uncritical (although, broken trust by a thousand cuts may matter in the long run). The neurological – feeling process of untrust may even be fairly benign in the sense of how trustor feels upon the broken trust. Though the result may be the same. Having a customer or loved one walking away from you. It may be easier to recover trust from such more benign untrust events. However, it stands to reason that the longer a trust bond exist the more painful and costly will the untrusting process be and obviously far more difficult to recover from.

In most cases, if the AI action is as the human agent would expect, or have anticipated, many a human might not care about transparency or explainability of the artificial agent’s action.

Despite having your trust satisfied by an AI-based action, we should care about auditability. In case over the longer run, the human trust in an AI-based solutions turns out to have been misplaced. Thus, the AI-based outcome of a given action is counter to what the human was expecting or anticipating. An explanation for the outcome may not prevent the trust of human agent, and the trustworthiness of the AI-based agent, to be broken.

trust circle

Trust deconstructed.

If you know everything absolutely, you would not need to trust anyone to make a decision.

Just be careful about the vast amount of cognitive biases that may result in you falsely believing you know it all. Men in particular suffers from the ailment of believing in their own knowledge being absolute (Larsen, 2017).

Someone who knows nothing, have only faith as  guide for trust.

On the other hand, someone who knows nothing about a particular problem has no other source for trust than faith that trust is indeed warranted. It’s a scary place to be.

Let’s deconstruct trust.

An agent’s trust (the trustor) is an expectation about a future action of another agent (the trustee). That other agent has been deemed (at least temporarily) trustworthy by the trustor. That other agent (the trustee) may also represent a given group or system.

In John K. Rempel 1985 paper ”Trust in close relationships” defines the following attributes of human to human trust (i.e., where both trustor and trustee are human agents);

  • Predictability or consistency – trustor’s subjective assessment of trustee’s trustworthiness. Prior behavior of trustee is an important factor for the trustor to assess the posterior expectations that the trusted agent will consistently fulfil trustor’s expectations of a given action (or in-action). As the trustor gather prior experience with trustee, the confidence in the trustee increases. Confidence should not be confused with faith which is a belief in something without having prior fact-based knowledge.
  • Dependability – a willingness to place oneself as trustor in a position of risk that the trustworthiness of the trustee turns out not to be warranted with whatever consequences that may bring. Note that dependability can be seen as an outcome of consistency. Put in another way a high degree consistency/predictability reduces the fear of dependability.
  • Faith – is a belief that goes beyond any available evidence required to accept a given context as truth. It is characterized as an act of accepting a context outside the boundaries of what is known (e.g., a leap of faith). We should not confuse faith with confidence although often when people claim to be confident, what they really mean is that they have faith.

For agent-to-agent first-interaction scenarios, the initial trust moment, without any historical evidence of consistency or predictability, a trustor would need to take a leap of faith in whether another agent is trustworthy or not. In this case, accepting (i.e., believing) the trustee to be trustworthy, the trustor would need to accept a very large degree of dependability towards the other agent and accept the substantial risk that the trust in the trustee may very well not be warranted. This scenario for humans often lends itself to maximum stress and anxiety levels of the trusting agent.

After some degree of consistency or historical trustworthiness have been establish between the two agents, the trustor can assign a subjective expectation of future trustworthiness of the other agent. This then leads to a lesser subjective feeling of dependability (or exposure to risk) as well as maybe a reduced dependency on shear faith that trust is warranted. This is in essence what one may call sustainable trust.

As long as the trustor is a human, the other agent (i.e., the trustee) can be anything from another human, machine, complex systems, automation, autonomous system, institution (public and private), group, and so forth. Much of what is describe above would remain the same.

Lots of work has been done on trust bonds in Human-Automation relationships. How about trust bonds between Human and AI-enabled applications (e.g., services and products in general). In their 2018 article “The Future of Artificial Intelligence Depends on Trust“, Rao and Cameron (both from PwC) describes 3 steps towards achieving human – AI-system trust;

  • Provability – predictability and consistency.
  • Explainability – justification for an AI-based decision (e.g., counterfactual constructions). Note transparency and explainability may be closely related depending on how one implements explainability.
  • Transparency – factors influencing algorithm-based decisions should be available (or even visible) to users impacted by such decisions. E.g., for a rejected health insurance (all) factors impacting the negative decision to reject the application should be available to the applicant.

Rao and Cameron’s suggestions appear reasonably important for trust. However, as previously described these suggestions pretty much relates to the trustee agent side of things, ignoring some of the other important human factors (e.g., dependability, faith, assessment of risk, etc..)for trust between a human and another agent (sentient or otherwise).

Further, explainability and transparency may be particular important when trust is broken (assuming that the trustor cares to “listen”) between the human agent and the AI-based agent (or any other digital or non-sentient agent for that matter). It may not be terribly relevant for the likely vast majority of users where an action is delivered confirming that trust was warranted. If you have trained your AI will it would be fair to assume that the majority of outcomes are consistently as expected. A positive trust event that is likely to lead to a re-enforcement of trust and trustworthiness of the AI-agent.

Also these concepts, while important, doesn’t do much for the initial step of trusting a non-Human agent. How do you design your trustee agent to ease the initial barrier of use and acceptance. When there is no priors, you need the user or trustor to be comfortable with taken a leap of faith as well as being maybe maximally dependable.

Untrust and that human feeling of betrayal.

Trust can be broken. Trustworthiness can decline. Untrusting is the process where a previously trust-bond has been broken and the strength of trust declined.

Heuristic: the stronger the trust bond is between two agents, the stronger will the untrusting process be in case of broken trust. Making trust recovery the more difficult.

Have you ever wondered why two people who supposedly have loved each other in the past (supposedly for many years) can treat each other as enemies? Betraying a strong trust bond can be a very messy emotional and physiologically strenuous process. Some trust bonds broken will never recover (e.g., breakups, friendship betrayals, unfaithfulness, theft, lies, …). Others, depending on the initial utility or value assigned to the bond, may be fairly benign without much strong emotions associated with the untrusting process (e.g., retail purchases, shopping experiences, low value promises of little impact if not fulfilled, etc… ).

The question is whether the untrusting of a human-machine trust bond is similar to untrusting of a human-human trust bond. Moreover, are there a difference between an inanimate machine, simpler human-operated automated systems and an AI-based application that humans may even anthropomorphize to various degrees. Are your trust and untrust process different for Siri or Alexa or than it is for Microsoft Clippy, assuming anyone ever really trusted that wicked steely fellow.

How valid is it to use our knowledge of human-human trust & untrust on Human-Agent relations with the Agent being non-Human or a human simulacrum in nature.

In human we trust, in machines not so much.

Would you trust your superior or fellow expert with a critical corporate decision? How often would you trust such decisions made by other fellow human beings?

Even if you don’t have a choice or a final say (well apart from arguing your piece of mind … at least as it happens in most places of Western Europe) … it is your own choice whether you trust such a decision or not.

As shown in the below chart’s magenta columns, it turns out that most humans frequently do trust their superiors and fellow human experts with critical decisions relevant to their work. In the survey shown below there is little difference between human-human trust whether a decision success rate was left unspecified or specified to be 70% (i.e., 7 out of 10 decisions turns out as expected or promised and 3 out of 10 not). This might mean that most people expect heuristically a corporate decision maker to have a 70% success rate in his decisions. I found this surprising as I do not believe human decisions are that good. But I guess we are good at post-rationalization and being much louder with our successes than our failures (suppressing the bad memories of failure may come in handy here).

trust human vs ai

Okay we clearly trust our fellow human with critical decision making (or at least so we say). Do we trust an AI with the same critical corporate decision?

The answer is  … clearly … No Way do we trust AIs to make critical corporate decisions (and any other types of decisions for that mater … at least of what we are aware of). As can be seen from the above chart, a majority of people would only infrequently trust an AI making critical decisions. Specifying that the AI has a decision success rate better than 70% does reduce the amount of people who would only infrequently trust such decisions (i.e., from 62% to 43%). However, it only marginally increases the share of people who would frequently trust an AI-based critical decision from 13% to 17% (which is barely statistically significant). Remember we are readily willing to trust a human decision maker frequently. An AI? … not so much! Even in what should be regarded as an Apples for Apples scenario, with same performance specified for the Human trustee as for AI-based trustee.

Trust bonds between humans appear much stronger than what it is with an AI. Though that may not be too surprising. Most of us have very little prior experience with trusting AI-based decisions (at least of what we are consciously aware of). So the starting point for AI-based trust (i.e., AI being the trustee part if the trust bond) is Faith and accepting Dependability rather than having a basis for assessing Consistency or Predictability of AI-based decisions. There may also be some very interesting neurological (i.e., brain) reasons why our ability to trust an inanimate agent such as an AI, a Robot or a piece of intelligent machinery is different from that of a human being.

My surveyed data could be interpreted as we seem to work with a heuristic decision success rate for human (or at least the manager or expert kind of humans) at or better than 70%. More than half of us would frequently trust a human decision maker at such a performance level.

Not so much with an AI-based decision (innate) maker. While specifying that the AI has a success rate of 70% or better in its decision making doesn’t really change the proportion of us that would frequently trust such decisions. It does increase the amount of us trustors that would at about half the time concede trust in an AI-based decision (i.e., given the 70% success rate).

What moves the trust needle? If we impose on our under appreciated AI-based decision maker a 95% or better success rate, 40% of us would frequently trust such decisions. This is still a lower proportion of trustees than for a human decision maker with a success rate of 70% or better. However, there is still almost 1 in 3 of us that only infrequently would trust of such an AI (with 95% or better success rate). In comparison only about 1 in 10 would only infrequently trust a human decision maker with a 70% or better success rate.

trust in ai 2

So clearly AI does have trust issues. Certainly with respect to decision making, AI is not regarded as trustworthy as a human. The bar for trusting an AI appears to be very high.

However, it seems reasonable that some of the reasons for a lower trust level is simply due to most people haven’t had a lot of exposure to AI in general, AI-based augmentation and actions where trust would be essential.

Algorithmic aversion – nothing really new under the sun.

As described in “On the acceptance of artificial intelligence in corporate decision making” (Larsen, 2017), algorithms, even simple ones, does in general perform better than human beings limited to their own cognitive abilities in terms of predictions (i.e., an essential part of decision making whether done consciously or subconsciously). This result has been confirmed many times over by the likes of Paul Meehl (Meehl, 1954), Robyn Dawes (Dawes, 1979) and many other researchers in the last 50 – 60 years. Clearly, machine learning algorithms does not offer an error free approach to decisions making. However, algorithmic approaches does offer predictions and solutions with lower, often superior, error rates. And not unimportantly … quantifiable error rates in comparison with what would be the case of human cognition based decision.

Humans remain very resistant in adapting more mathematical approaches despite such being demonstrably less prone to error than human-based decision making without algorithmic augmentation. As Berkeley Dietvorst recent paper puts it “People erroneously avoid algorithms after seen them err” (Dietvorst, Simmons and Massey, 2014). Dietvorts call this behavior or emotion algorithmic aversion. This is very consistent with my own findings of humans having a very high bar of success rate (or accuracy) for AI-based decisions. Even at a 95% success rate of an AI-based decision, we prefer to trust a human decision maker with a success rate of 70%.

Machine-learning (at least the classical kind) based decisions or action recommendations offer better accuracy, transparency, understandability, consistency, robustness and auditability than most human-based decisions and actions.

Despite this, we, as humans, are much less forgiving when it comes to machine errors than human errors. The standard we expect of artificial intelligence are substantially higher than what we would require from a fellow human being or co-worker.

Trust in corporations and institutions … or lack of more accurately.

Almost 80% of consumers do not believe that companies using AI have their best interest in mind. This is the outcome of 3 surveys made in March 2018, April 2018 and September 2018.

This has also been a period where misuse of consumer information and data in general was hotly debated. So that majority of consumers does not trust corporations with having their best in mind is maybe not all that surprising. Consumer trust in corporations are in general at a low point. AI doesn’t help that trust issue.

trust in companies

Companies AI-based products and services are already at a disadvantages before they hit the market place. There is a substantial degree of mistrust among consumers towards corporations and companies. This resonates very well with a recent study of trust by ….

What about trust in public institutions capacity for protecting citizens and consumers against adversarial use of AI-based technologies for policies and in products and services? Well the public trust is fairly low as can be seen from the figure below.

turst in public institutions

The vast majority (80%!) of the general public has low, very low or no confidence in political institutions adequately considers the medium and long-term societal impact of AI proliferation.

There is unfortunately nothing surprising in the above (dis)trust level in institutions. This is largely confirmed by for example the 2018 Edelman Trust Barometer which is pretty bleak in terms of its “Global Trust Index” reflecting the general populations level of trust in institutions (e.g., government, businesses, media and NGOs).

So where do we go from here?

It is fair to say that for the consumer as well as for the corporate decision maker, their expectations towards the trustworthiness of AI-based products, services and resulting decisions or actions in general is low.

Despite the relative low trust in AI-based actions, I have also shown that on average we feel fairly comfortable with AI at least as a concept. Women, as it would appear from my surveys, are in general less comfortable with AI than men in general. While men with children under 18 years of age (possible younger children) expresses the highest degree of positive feelings towards AI.

The gender difference in how AI is perceived for the individual as well as for children, family members, friends and colleagues is a relative un-explored research area yet. It needs more attention as most trust research into human-machine trust bonding have been centered around professional operators of automated or autonomous complex systems (e.g., aviation, automotive, networks, etc…). I feel brave enough to make an educated guess that most of that research also have been focused on male operators and experts rather than gender balanced or explicitly gender focused.

In order for us to trust something, like an AI-based action (e.g., decision, recommendation, …), we often require an explanation for a given outcome or action. Most of us do like to receive an explanation, in particular for actions and outcomes that we perceived as having negative consequences or is counter to our beliefs of what should be a right decision or action. Explainable AI, whatever that really means, but surely will be context dependent, is one of the components of establishing trust. Explainability is important in order to appease law & policy makers, e.g., in order to comply with for example the European General Data Protection Regulation (GDPR) requirements that may (or may not) be interpreted also as a “Right to Explanation”. AI Transparency and AI Auditability are additional concepts that typically is mentioned together with explainable AI.

Typically the implied logic is that transparency leads to explainability that leads to ease of auditability. The question is whether such requirements in general are meaningful for the consumer of an AI-based product or service. There are two extremes are 1. A highly simplified system that can also be explained very simply or 2. A highly complex AI-based system that nevertheless are sufficiently transparent to be explained and audited. However, the explanation is of such complexity, that albeit transparent, would only be understood by an expert or the designer of that system. In one case the explanation for a given action is so simple that it is unnecessary. In the other the explanation is to complex that no lay person would be able to follow. Certainly much more work is required here in order to assess to what level and under which circumstances an explanation should be provided. It is always understood (although not always mentioned) that the explanation should be understood by the operator or user. Now that makes for an interesting challenge … Right?

As has been pointed out above, making a human trust a non-human agent is not only a matter of explainability assuming this explanation is understood. Any trust bond will have a utility or perceived value associated. The initiation of a trust bond may be faith based if no prior information is available. This initial phase often is associated with a degree of anxiety or fear of your trust is not fulfilled. There may be a high degree of dependability involved in the trust bond (e.g., autonomous driving) that adds to the anxiety. Only after prior experience or information becomes available will the importance of faith and anxiety around the assumed dependability diminish. The strength of the trust bond will increase. However, as the trust increase it also will also be increasingly sensitive to disappointment and perceived betrayal (also depending on the assigned utility to the bond). Too little work has been conducted understanding gender and cultural differences in the human-AI trust process. This is also true in general for any human-non-human trust relationships.

Some recent work indicates that anthropomorphizing (i.e., humanizing) the automation or AI-based agent appears to trigger neurochemical processes important in human-human trust bonds. See some pretty cool experiments towards the importance of anthropomorphizing automation agent by Visser et al (Visser, Monfort, Goodyear, Lu, O’Hara, Lee , Parasuraman & Krueger 2017) in their paper “A little anthropomorphism goes a log way: Effects of Oxytocin on Trust, Compliance, and Team Performance with Augmented Agents”. The question here is how far we can take humanizing AI. Will there be an uncanny valley effect at some point. Moreover, not all AI-functions should be humanized (that would be scary if even possible). Context clearly matters here. Lots of questions, still many answers outstanding and thus lots of cool research to be pursued.

Additional sources of wisdom.

Balfe N., Sharples S., and Wilson J.R., (2018). “Understanding is key: An analysis of factors pertaining to trust in a real-world automation system”. Human Factors, 60, pp. 477-495. Due to its recent publication you will find a good up to date account (as well as bibliography) on the state of art of human-automation trust research. This work establishes a strong connection between trust in and the understanding of automation.

Barret L.F., (2017). “How emotions are made: the secret life of the brain“. Houghton Mifflin Harcourt.

Baumgarten T., Fischbacher U., Feierabend A., Lutz K., and Fehr E., (2009). “The Neural Circuitry of a Broken Promise”. Neuron, 64, pp. 756 – 770.

Bergland, C., (2015). “The Neuroscience of Trust”,, August.

Choleris, E., Pfaff, D. and Kavaliers, M., (2013). “Oxytocin, vasopressin, and related peptides in the regulation of behavior”. Cambridge: Cambridge University Press.

Dawes R.M., (1979), “The robust beauty of improper linear models in decision making”, American Psychologist 571, pp.

Denson T.F., O’Dean S.M., Blake K.R., and Beames J.R., (2018). “Aggression in women: behavior, brain and hormones“. Frontiers in Behavioral Neuroscience, 12, pp. 1-20 (Article-81).

Dietvorst B.J., Simonojs J.P. and Massey C., (2014). “Algorithm Aversion: people erroneously avoid algorithms after seeing them err.”, Journal of Experimental Psychology: General, pp. . A study on the wide spread Algorithm Aversion, i.e., human expectations towards machines are substantially higher than to fellow humans. This results in a irrational aversion of machine based recommendations versus human-based recommendation. Even though algorithmic based forecasts are on average better to much better than human based equivalent in apples by apples comparisons.

Doshi-Velez F. and Korz M., (2017). “Accountability of AI under the law: the role of explanation“. Harvard Public Law, 18, pp. 1-15. Focus on the right to an explanation and what that might mean. Also very relevant to the European GDPR Article 22. Do note that whether Article 22, and Articles 13-15 as well, really does grant a user the right to an explanation is a matter of debate as pointed out by Wachter et al (2017).

Fischer K., (2018). “When transparency does not mean explainable”. Workshop on Explainable Robotic Systems (Chicago, March).

Frey C.B. and Osborne M.A., (2013). “The future of employment: how susceptible are jobs to computerization?“. Technology Forecasting and Social Change, 114, pp. 254-280.

Hawksworth J. and Berriman R., (2018). “Will robots really steal our jobs? An international analysis of the potential long term impact of automation“. PwC report.

Hiltbrand T., (2018), “3 Signs of a Good AI Model”., November.

Ito J., (2018). “The limits to explainability“, Wired (January).

Kosfeld M., Heinrichs M., Zak P.J., Fischbacher U., and Fehr E., (2005). “Oxytocin increases trust in humans”. Nature, 435, pp. 673-676.

Kramer R.M., (2009), “Rethinking Trust”. Harvard Business Review, June.

Larsen, K., (2017). “On the acceptance of artificial intelligence in corporate decision making a survey“.

Law S., (2010), “Dad. too. get hormone boost while caring for baby”,, October. Oxytocin is not only women for women breastfeeding. Research shows that men too have increased levels of oxytocin coinciding with child caring, physical contact and their spouse (and mother to their child).

Madhavan P. and Wiegmann D.A., (2007), “Similarities and differences between human-human and human-automation trust: an integrative review”. Theoretical Issues in Ergonomics Science, 8, pp. 277-301. (unfortunately behind paywall).

Meehl, P. E., (1954). “Clinical versus statistical prediction: A theoretical analysis and review of the literature“. University of Minnesota, pp. 1-161. Algorithmic versus human performance up-to the 50s is very well accounted for with Paul Meehl research work and his seminal book. It is clear that many of the topics we discuss today are not new.

Mori, M., MacDorman, K. and Kageki, N. (2012). “The Uncanny Valley [From the Field]“. IEEE Robotics & Automation Magazine, 19(2), pp. 98-100.

Nave G., Camerer C., and McCullough M., (2015), “Does Oxytocin Increase Trust in Humans? A Critical Review of Research”. Perspectives on Psychological Science, 10, pp. 772-789. Critical review of research into Oxytocin key role in social attachment including its effect of increased trust in human individuals with increased levels (above normal) of Oxytocin. Nave et al concludes that current results does not provide sufficient robust evidence that trust is associated with Oxytocin or even caused by it.

Rao A. and Cameron E., (2018), “The Future of Artificial Intelligence Depends on Trust”. Strategy+Business, July. Making the case for transparent, explainable and auditable AIs and why those concepts are important for the development of trust between humans and AI.

Rempel J.K., Holmes, J.G. and Zanna M.P., (1985), “Trust in close relationships”. Journal of Personality and Social Psychology, 49, pp. 95–112. (unfortunately behind paywall, however it is imo a super good account for trust in human to human relations).

Sapolsky R.M., (2017). “Behave: The Biology of Humans at Our Best and Worst”. Penguin Press. Robert Sapolsky addresses trust across his epic book from a neurobiological and behavioral perspective. Yes, you should read it!

Sheridan T.B. and Parasuraman R., (2005), “Chapter 2: Human-Automation Interaction”. Reviews of Human Factors and Ergonomics, 1, pp. 89 – 129. This is a really authoritative account for human interaction with automation as we find it in complex large-scale systems (e.g., aircrafts, aircraft control, manufacturing robotics-intensive plants, modern communications networks, modern power plants, chemical industries and infrastructure, modern as well as autonomous vehicles & drones, etc…).

Simpson J.A., (2007), “Psychological Foundations of Trust”. Current Directions in Psychological Science, 16, pp. 264-268.

Visser, E.J.d., Monfort S.S., Goodyear K., Lu L., O’Hara M., Lee M.R., Parasuraman R., and Krueger F., (2017), “A little anthropomorphism goes a log way: Effects of Oxytocin on Trust, Compliance, and Team Performance with Augmented Agents”. The Journal of the Human Factors and Ergonomics Society, 59, pp. 116-133.

Wachter S., Mittelstad B., and Floridi L., (2017). “Why a right to explanation of automated decision-making does not exist in the General Data Protection Regulation“. International Data Privacy Law, pp. 1-47. Wachtel et al claims that Article 22 (or other articles for that matter) does not express that users of automated decision-making applications have a right to an explanation. If anything at most a user may have a right to information about the decision process. It for solved a puzzle as there is nowhere in Article 22 any mention of an explanation more of a right to opt out. Articles 13 to 15 (of GDPR) only offers limited information about the process of which a given decision has been made (e.g., 15 and 14 are maybe the strongest articles with respect to information provision).

Wachter S., Russel C., and Mittelstad B., (2018). “Counterfactual explanations without opening the black box: automated decisions and GDPR“. Harvard Journal of Law & Technologies, 31, pp. 1-52.

Whitbourne S.K., (2015), “Why betrayal hurts so much (and who seeks revenge)”., April.

Wickens C.D. and Dixon S.R., (2007), “The benefit of imperfect diagnostic automation: a synthesis of the literature”. Theoretical Issues in Ergonomics Science, 8, pp. 201-212.(unfortunately behind paywall). Wickens & Dixon has reviewed data from 20 studies upon which they have derived that a reliability cross-over point of about 70%. Below 70% no automation was regarded better than automation. Only above 70% reliability did automation bring positive cost-benefit returns.

Yao S., Zhao W., Cheng R., Geng Y., Luo L., and Kendrick K.M., (2014). “Oxytocin makes females, but not males, less forgiving following betrayal of trust“. International Journal of Neuropsychopharmacology, 17, pp. 1785-1792.

Zak P.J., (2017), “The Neuroscience of Trust”. Harvard Business Review, January-February.

2018 Edelman Trust Barometer (Global Report).


I rely on many for inspiration, discussions and insights. Any mistakes made is my own. In particular I would like to thank Liraz Margalit and  Minoo Abedi for many useful suggestions and great inspirational discussions around the topic of trust. I also greatly acknowledge my wife Eva Varadi for her support, patience and understanding during the creative process of writing this Blog.

AI Ethics, AI Policy, AI strategy, Algorithmic, Artificial Intelligence, Bias, Ethics, Machine Learning

Machine … Why ain’t thee Fair?

“It is better that ten guilty persons escape, than that one innocent party suffer.”, Sir William Blackstone (1765) paraphrased.


Machines mess up. Humans even more so. The latter can be difficult, even impossible, to really understand. The former is a bit more straightforward. This short essay describes how we can get an idea of some of the root causes of machine model errors. Particular as those machine model errors relate to group bias and unfairness. Its elementary really, as John Lee Miller would say. Look at your model’s confusion defined by its false positives and negatives as well as its true results. Reflect on this overall as well as for well-defined groups that exist within your sample population under study. My intention is to point out (the obvious maybe?) that the variations in each of your attributes, feed into your learning machine model, will determine the level of confusion that your model ultimately will have towards individual groups within your larger population under study. Model confusion that may cause group biases and unfair treatment of minority groups lost in the resolution of your data and chosen attributes.

Intelligent machines made in our image in our world.

We humans are cursed by an immense amount of cognitive biases clouding our judgments and actions. Maybe we are also blessed by for most parts of life being largely ignorant of those same biases. We readily forgive our fellow humans mistakes. Even grave ones. We frequently ignore or are unaware of our own mistakes. However, we hold machines to much stricter standards than our fellow humans. From machines we expect perfection. From humans? … well the story is quite the opposite.

Algorithmic fairness, bias, explainability and ethical aspects of machine learning are hot and popular topics. Unfortunately, maybe more so in academia than elsewhere. But that is changing too. Experts, frequently academic scholars, are warning us that AI fairness is not guarantied even as recommendations and policy outcomes are being produced by non-human means. We do not avoid biased decisions or unfair actions by replacing our wet biological carbon-based brains, subject to tons of cognitive biases, with another substrate for computation and decision making that is subjected to information coming from a fundamentally biased society. Far from it.

Bias and unfairness can be present (or introduced) at many stages of a machine learning process. Much of the data we use for our machine learning models reflect society’s good, bad and ugly sides. For example, data being used to train a given algorithmic model could be biased (or unfair) either because it reflect a fundamentally biased or unfair partition of subject matter under study or because in the data preparation process the data have become biased (intentionally or un-intentionally). Most of us understand the concept of GiGo (i.e., “Garbage in Garbage out”). The quality of your model output, or computation, is reflected by the quality of your input. Unless corrected (often easier said than done) it is understandable that an outcome of a machine learning model may be biased or fundamentally unfair, if the data input was flawed. Likewise, the machine learning architecture and model may also introduce (intentional as well as un-intentional) biases or unfair results even if the original training data would have been unbiased and fair.

At this point, you should get a bit uneasy (or impatient). I haven’t really told you what I actually mean by bias or unfairness. While there are 42 (i.e., many, but 42 is the answer to many things unknown and known) definitions out there defining fairness (or bias), I will define it as “a systematic and significant difference in outcome of a given policy between distinct and statistically meaningful groups” (note that in case of in-group systematic bias it often means that there actually are distinct sub-groups within that main group). So, yes this is a challenge.

How “confused” is your learned machine model?

When I am exploring outcomes (or policy recommendations) of my machine learning models, I spend a fair amount of time on trying to understand the nature of my false positives (i.e., predicted positive outcomes that should have been negative) as well as false negatives (predicted negative outcomes that should have been positive). My tool of choice is the so-called confusion matrix (i.e., see figure below) which summarizes your machine learning model’s performance in terms of its accuracy as well as inability of predicting outcomes. It is a simple construction. It is also very powerful.

confusion matrix

The above figure provides a confusion matrix example of a loan policy subjected to machine learning. We have

  • TRUE NEGATIVE (Light Blue color): Model suggest that the loan application should be rejected consistent with the actual outcome of the loan being rejected. This outcome is a loss mitigating measure and should be weighted against new business versus the risk of default providing a loan.
  • FALSE POSITIVE (Yellow color): Model suggest that the loan application should be approved in opposition to the actual outcome of the loan being rejected. Note once this model would be operational this may lead to increased risk of financial loss to the business offering the loans that the applicant is likely to default on. May also lead to negative socio-economical impact to the individuals that are offered a loan they may not be able to pay back.
  • FALSE NEGATIVE (Red color): Model suggest that the loan application should be rejected in opposition with the actual outcome of the loan being accepted. Note once this model would be operational this may lead to loss of business by rejecting a loan application that otherwise would have had a high likelihood of being payed back. Also may lead to negative socio-economical impact to the individuals being rejected due to lost opportunities for individuals and community.
  • TRUE POSITIVE (Green color): Model suggest that the loan application should be approved consistent with the actual outcome of the loan being approved. This provides for new business opportunities and increased topline within an acceptable risk level.

The confusion matrix will identify the degree of bias or unfairness that your machine learning model introduces between groups (or segments) in your business processes and in your corporate decision making.

The following example (below) illustrates how the confusion matrix varies with changes to a group’s attributes distributions, e.g., variance differences (or standard deviation), mean value differences, etc..

confusion matrix example

What is obvious from the above illustration is that policy outcome on a group basis is (very) sensitive to the attribute’s distribution properties between those groups. Variations in the attributes between groups can illicit biases that ultimately may lead to unfairness between groups but also within a defined group.

Thus, the confusion matrix leads us back to your chosen attributes (or features), their statistical distributions, the quality of your data or measurements that make up those distributions. If your product or app or policy applies to many different groups, you better understand whether those groups are treated the same, good or bad. Or … if you intend to differentiate between groups, you may want to be (reasonably) sure that no unintended bad consequences will negatively expose your business model or policy.

A word of caution: even if the confusion matrix gives your model “green light” for production, you cannot by default assume that the results produced may not result in systematic group bias and ultimately unfairness against minority groups. Moreover, in real-world implementations it is unlikely to completely free your machine models from errors that may lead to a certain degree of systematic bias and unfairness (however small).

Indeterminism: learning attributes reflects our noisy & uncertain world.

So, let’s say that I have a particular policy outcome that I would like to check whether it is biased (and possible unfair) against certain defined groups (e.g., men & women). Let’s also assume that the intention with the given policy was to have a fair and unbiased outcome without group dependency (e.g., independence of race, gender, sexual orientation, etc.). The policy outcome is derived from a number of attributes (or features) deemed important but excludes obvious attributes that is thought likely to cause the policy to systematically bias towards or against certain groups (e.g., women). In order for your machine model to perform well it needs in general lots of relevant data (rather than Big Data). For each individual, in your population (under study), you will gather data for the attributes deemed relevant for your model (and maybe some that you don’t think matters). Each attribute can be represented by a statistical distribution reflecting the variation within the population or groups under study. It will often be the case that an attribute’s distribution will be fairly similar between different groups. Either because it really is slightly different for different groups or because your data “sucks” (e.g., due to poor quality, too little to resolve subtle differences, etc… ).

If a policy is supposed to be unbiased, I should not be able to predict with any (statistical) confidence which group a policy taker belongs to, given the policy outcome and the attributes used to derive the policy. Or in other words, I should not be able to do better than what chance (or base rate) would dictate.

For each attribute (or feature), deemed important for our machine learning model, we either have, or we collect, lots of data. Furthermore, for each of the considered attributes we will have a distribution represented by a mean value and a variance (and high order moments of the distribution such as skewness, i.e., the asymmetry around the mean and kurtosis, i.e., the shape of distributions tails). Comparing two (or more) groups we should be interested in how each attribute’s distribution compare between those groups. These differences or similarities will point towards why a machine model end up bias against a group or groups. And ultimately be a significant factor in why your machine model ends up being unfair.

Assume that we have a population, consisting of two (main) groups, that we are applying our new policy to (e.g., loans, life insurance, subsidies, etc..). If each attribute for both groups have statistical identical distributions, then … no surprise really … there should be no policy outcome difference between one or the other group. Even more so, unless there are attributes that are relevant for the policy outcome and have not been considered in the machine learning process, you should end up with an outcome that has (very) few false positives and negatives (i.e., the false positive & false negative rates are very low). Determined by the variance level of your attributes and the noise level of your measurements. Thus, we should not observe any difference between the two groups in the policy outcome including the level of false positives and negatives.

policy outcome & attributes

From the above chart it should be clear that I can machine learn a given policy outcome for different groups given a bunch of features or attributes. I can also “move” my class tags over to the left side and attempt to machine learn (i.e., predict) my classes given the attributes that are supposed to make up that policy. It should be noted that if two different groups attributes only differ (per attribute) in their variances, it not be possible to reliably predict which class belongs to what policy outcome.

Re:Fairness It is in general more difficult to judge whether a policy is fair or not than whether it is biased. One would need to look between-classes (or groups) as well as in-class differentiation. For example, based on the confusion matrix, it might be unfair for members of a class (i.e., sub-class) to end up in the false positive or false negative categories (i.e., in-group unfairness). Further along this line, one may also infer that if two different classes have substantial different false positive and negative distributions that this might reflect between-class unfairness (i.e., in class is treated less poorly than another). Unfairness could also be reflected in how True outcomes are distributed between groups and maybe even within a given group. To be fair (pun intended), fairness is a much richer context dependent concept than a confusion matrix (although it will signal that attention should be given to unfairness).

When two groups’ have statistically identical distributions for all attributes considered in the policy making or machine learning model, I would also fail to predict group membership based on the policy outcome or the policy’s relevant attributes (i.e., sort of intuitively clear). I would be no better of than flipping a coin in identifying a group member based on attributes and policy. In other words the two groups should be treated similarly within that policy (or you don’t have all the facts). This is also reflected by the confusion matrix having approximately same values in each position (i.e., if normalized it would be ca. 25% at each position).

policy outcome

As soon as an attribute’s (statistical) distribution starts to differ between different classes, the machine learning model is likely to result in a policy outcome difference between those classes. Often you will see that any statistical meaningful difference in just a few of the attributes that may define your policy will result in uniquely different policy outcome and thus possibly identify bias and fairness issues. Conversely it will also quickly allow a machine to learn a given class or group given those attribute differences and thus allude to class differences in a given outcome.

Heuristics for group comparison

If the attribute distributions for different groups are statistically similar (per attribute) for a given policy outcome, your confusion matrix should be similar across any group within your chosen population under study, i.e., all groups are (treated) similar.

If attribute distributions for different groups are statistically similar (per attribute) and you observe a relative large ratio of false positives or false negatives, you are likely missing significant attributes in your machine learning process.

If two groups have very different false positive and/or false negative ratios you are either (1) missing descriptive attributes or (2) having a high difference in distribution variation (i.e., standard deviation) for at least some of your meaningful attributes. The last part may have to do with poor data quality in general, higher noise in data, sub-groups within the group making that group a poor comparative representative, etc..

If one group’s attributes have larger variations (i.e., standard deviations) than the “competing” group, you are likely to see a higher than expected ratio of false positives or negatives for that group.

Just as you can machine learn a policy outcome for a particular group given its relevant attributes, you can also predict which group belongs to what policy outcome from its relevant attributes (assuming there is an outcome differentiation between them).

Don’t equate bias with unfairness or (mathematical) unbiasedness with fairness. There are much more to bias, fairness and transparency than what a confusion matrix might be able to tell you. But it is the least you can do to get a basic level of understanding of how your model or policy performs.

Machine … Why ain’t thee fair?

Understanding your attributes’ distributions and in particular their differences between your groups of interest will upfront prepare you for some of both obvious as well as more subtle biases that may occur in when you apply machine learning to complex policies or outcomes in general.

So to answer the question … “Machine … why ain’t thee fair?” … It may be that the machine has been made in our own image with data from our world.

The Good news is that it is fairly easy to understand your machine learning model’s biases and resulting unfairness using simple tools such as the confusion matrix and understanding your attributes (as opposed just “throw” them into your machine learning process).

The Bad news is that correcting for such biases are not straightforward and may even result in un-intended consequences leading to other biases or policy unfairness (e.g., by correcting for bias of one group, your machine model may increase bias of another group which arguably might be construed as unfair against that group).

Additional sources

Julia Angwin & Jeff Larson, “Machine Bias: There’s software used across the country to predict future criminals. Ands it’s biased against blacks” (May 2016), ProPublica. See also the critique of the ProPublica study; Flores et al.’s “False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And it’s Biased Against Blacks.”” (September 2016) Federal Probation 80.

Alexandra Chouldechova (Carnegie Mellon University), “Fair prediction with disparate impact: A study of bias in recidivism prediction instruments” (2017).

Rachel Courtland, “Bias detectives: the researchers striving to make algorithms fair” (Nature, 2018, June).

Kate Crawford (New York University, AI Now Institute) keynote at NIPS 2017 and her important reflections on bias; “The Trouble with Bias”.

Arvind Narayanan (Princeton University) great tutorial; “Tutorial: 21 fairness definitions and their politics”.

Kim Kyllesbech Larsen, “A Tutorial to AI Ethics – Fairness, Bias and Perception” (2018), AI Ethics Workshop.

Kim Kyllesbech Larsen, “Human Ethics for Artificial Intelligent Beings” (2018), AI Strategy Blog.


I rely on many for inspiration, discussions and insights. In particular for this piece I am indebted to Amit Keren & Ali Bahramisharif for their suggestions of how to make my essay better as well as easier to read. Any failure from my side in doing so is on me. I also greatly acknowledge my wife Eva Varadi for her support, patience and understanding during the creative process of writing this Blog.