Data-driven decision making … what’s not to like about that?

Approximately 400 corporate decision makers have been surveyed for their confidence in their own corporate decision-making skills, their opinion of their peers skills and their acceptance of corporate data-driven decision making in general, as well as such being augmented by artificial intelligence. The survey, “Corporate data-driven decision making and the role of Artificial Intelligence in the decision making process”, reveals the general perception of the corporate data-driven environment available to corporate decision maker, e.g., the structure and perceived quality of available data. Furthermore, the survey explores the decision makers’ opinions about bias in available data and applied tooling, as well as their own and their peers biases and possible impact on their corporate decision making.

“No matter how sophisticated our choices, how good we are at dominating the odds, randomness will have the last word” – Nassim Taleb, Fooled by Randomness.

We generate a lot of data and also we have an abundance of data available to us. Data is forecasted to continue to grow geometrically until kingdom come. There is little doubt that it will, as long as we humans and our “toys” are around to generate it. According with Statista Research, in 2021 we expect that a total amount of almost 80 Zetta Bytes (ZB) will have been created, captured, copied or consumed. That number corresponds to 900 years of Netflix viewing or that every single person (ca. 8 billion persons) have consumed 10 TB up-to today (effectively since early 2000s). It is estimated that there is 4.2 billion active mobile internet users worldwide. Out of that, ca. 5% (ca. 4 ZB or about 46 years of Netflix viewing) of the total data is being stored with a rate of 2% of newly generated data. Going forward expectations are a annual growth rate of around 21%. The telecom industry (for example) expect an internet-connected device per square meter, real-time monitoring and sensoring its environment, that includes you, me and your pet. Combined with your favorite smartphone, a super advanced monitoring and data collection devices in its own merit, the resolution of the human digital footprint increase many folds over the next years. Most of this data will be discarded. Though not before relevant metadata have been recorded and decided upon. Not before your digital fingerprint has been enriched and updated, for business and society to use for its strategies and policies, for its data-enriched decision making or possible data-driven autonomous decision making routines.

From a data-driven decision making process, data that is being acted upon can be both stored data as well as non-stored data, that would then be acted upon in real-time.

This amount of existing and newly generated data continues to be heralded as extremely valuable. More often than not, as proof point by referring to the value or turnover of the Big 5, abbreviated FAANG (before Google renamed itself to Alphabet and Facebook to Meta). Data is the new Oil is almost as often placed in presentations and articles on Big Data as Arnold Schwarzenegger in talks on AI. Although, more often than not, presenters and commentators on the value of data forget that the original comparison to oil was, that just like crude oil, data needs to be processed and broken down, in order to extract its value. That value-extraction process, like crude oil, can be dirty and cause primary as well as secondary “pollution” that may be costly, not to mention time-consuming, to get rid off. Over the last couple of years some critical voices have started to question the environmental impact of our obsession with extraction of meaning out of very big data sets.

I am not out to trash data science or the pursuit of meaning in data. Quiet the contrary. I am interested in the how to catch the real gold nuggets in the huge pile of data-dung and sort away the spurries false (deliberate or accidentally faked) signals that leads to sub-optimal data-driven decisions or out-right black pearls (= death by data science).

Clearly, given the amount of data being generated in businesses, as well as in society at large, the perceived value of that data, or more accurately, the final end-state of the processed data (e.g., after selection, data cleaning, modelling, …) and the inferences derived from that processed data, data-driven decision making must be a value-enhancing winner for corporations and society.

The data-driven corporate decision making.

What’s wrong with human-driven decision making? After all, most of us would very firmly declare (maybe even solemnly) that our decisions are based on real data. The challenge (and yes often a problem in critical decision making) is that our brain has a very strong ability (maybe even preference) for seeing meaningful patterns, correlations and relationships in data that we have available to us digitally or have been committed to our memory from past experiences. The human mind have great difficulties to deal with randomness, spurious causality of events, and connectedness. Our brain will try to make sense of anything it senses, it will correlate, it will create coherent narratives of the incoherently observed, and replace correlations with causations to fit a compelling idea or belief. Also, the brain will filter out patterns and anomalies (e.g., like gorillas that crash a basketball game) that does not fit its worldview or constructed narrative. The more out of place a pattern is, the less likely is it to be considered. Human-decision making frequently is based on spurious associations, fitting our worldview or preconceived ideas of a topic, and ignoring any data that appears outside our range of beliefs (i.e., “anomalies”). Any decision-process involving humans will in one way or the other be biasedWe can only strive to minimize that human bias by reducing the bias-insertion points in our decision-making process.

A data-driven business is a business that uses available & relevant data to make more optimized and better decisions compared to purely human-driven ones. It is a business that gives more credibility to decisions based on available data and structural reasoning. It is a business that may be less tolerant to emotive and gut-feel decision rationales. It hinges its business on rationality and translating its data into consistent and less uncertain decisions. The data-driven business approaches the co-called “Mathematical Corporation” philosophy where human-driven aspects of decision making becomes much less important, compared to algorithmic data-driven decisions.

It sound almost too good to be true. So it may indeed be too good. It relies very much on having an abundance of reliable, unbiased and trustworthy (whatever that means) data that we can apply our unbiased data processing tools on and get out unambiguous analysis that will help make clear unbiased decisions. Making corporate decisions that are free of interpretation, emotions and biases. Disclaimer: this paragraph was intended to be ironic and maybe a bit sarcastic.

How can we ensure that we make great decisions based on whatever relevant data we have available? (note that I keep the definition of great decision a bit diffuse).

Ideally, we should start with an idea or hypothesis that we want to test and act upon. Based on our idea, we should design an appropriate strategy for data collection (e.g., what statisticians call experimental design), ensuring proper data quality for our analysis, modelling and final decision. Typically after the data collection, the data is cleaned and structured (both steps likely to introduce biases) that make it easier to commit to computinganalysis and possible statistical or mathematical modelling. The outcome of the analytics and modelling provides insights that will be the basis for our data-driven decision. If we have done our homework on data collection, careful (and respectful) data post-processing, understanding the imposed analytical framework, we can also judge whether the resulting insights are statistically meaningful, whether our idea, our hypothesis, is relevant and significant and thus is meaningful to base a decision upon. It seems like a “no-brainer” that the results of decisions are being tracked and fed back into a given company’s data-driven process. This idealized process is depicted in the picture below.

Above depicts a very idealized data-driven decision process, lets call it the “ideal data-driven decision process”. This process may provide better and more statistically sound decisions. However, in practice companies may follow a different approach to searching for data-driven insights that can lead to data-driven decisions. The picture below illustrates an alternative approach to utilizing corporate and societal data available for decision making. To distinguish it from the above process, I will call it the “big-data driven decision process” and although I emphasis big data, it can of course be used on any sizable amount of data.

The philosophy of the “big-data driven decision process” is that with sufficient data available, pattern and correlation search algorithm will extract insights that subsequently will lead to great data-driven decisions. The answer (to everything) is already in the big-data structure and thus the best decision follows directly from our algorithmic approach. It takes away the need for human fundamental understanding, typically via models, of the phenomena that we desire to act upon with a sought after data-driven decision.

The starting point is the collected data available to a business or entity, interested using its data for business relevant decisions. Data is not per se collected as part of an upfront idea or hypothesis. Within the total amount of data, sometimes subsets of data may be selected and often cleaned, preparing it for subsequent analysis, the computing. The data selection process often happens with some (vague) idea in mind of providing backup, or substance, for a decision that a decision-maker or business wants to make. In other instances, companies let pattern search algorithm loose on the collected or selected data. Such algorithms are very good at finding patterns and correlations in datasets, databases and datastores (often residing in private and public clouds). Such algorithmic tools will provide many insights for the data-driven business or decision maker. Based on those insights the decision maker can then form ideas or hypotheses that may support in formulating relevant data-driven decisions. In this process, the consequences of a made decision may or may not be directly measured missing out on the opportunity to close-the-loop on the business data-driven decision process. In fact, it may not even be meaningful to attempt to close-the-loop due to the structure of data required or vagueness of the decision-foundation.

The “big-data-decision driven process” rarely leads to the highest quality in corporate data-driven decision making. In my opinion, there is a substantial risk that businesses could be making decisions that are based on spurious (nonsense) correlations. Falsely believing that such decisions are very well founded due to the use of data- and algorithmic-based machine “intelligence”. Furthermore, the data-driven decision-making process, as described above, have a substantially higher amount of bias-entry points than a decision-making process starting with an idea or hypothesis followed by a well thought through experimental design (e.g., as in the case of our “ideal data-driven decision process”). As a consequence, a business may incur a substantial risk of reputational damage. On top of the consequences of making a poor data-driven business decision.

As a lot of data available to corporations and society at large are generated by humans, directly or indirectly, it is also prone to human foibles. Data is indeed very much like crude oil that need to refined to be applicable to good decision making. The refinement process, while cleaning up data and making it digestible for further data processing, analytics and modelling, also may introduce other issues that ultimately may result in sub-optimal decisions, data-driven irrespective. Thus, corporate decisions that are data-driven are not per definition better than ones that are more human-driven. They are ultimately not even that different after having been refined and processed to a state that humans can actually act upon it. It is important however that we keep in mind that big data tend to have many more spurious correlations and adversarial patterns (i.e., patterns that looks compelling and meaningful but are spurious in nature) than meaningful causal correlations and patterns.

Finally, it is a bit of a fallacy to believe that even if many corporations have implemented big data systems and processes, it means that decision-relevant data exists in abundance in those systems. Frequently, the amount of decision-relevant data is fairly limited and may therefor also increase the risk and uncertainty of data-driven decisions made upon such. The drawback of small data is again very much about the risk of looking at random events that appear meaningful. Standard statistical procedures can provide insights into the validity of small data analysis and assumptions made, including the confidence that we can reasonable assign or associate with such. For small-data-driven decisions it is far better to approach the data-driven decision making process according with ideal process description above, rather than attempting to selected relevant data out of a bigger data store.

Intuition about data.

As discussed previously, we humans are very good at detecting real, as well as illusory (imagined), correlations and patterns. Likewise, so are our statistical tools, algorithms and methodologies we apply to our data. Care must always be taken to ensure that inferences (assumptions) being made are also sensible and supported by statistical theory.

Correlations can help us make predictions of the effect of event may have on another. Correlations may help us to possible understand relationships between events and possibly also their causes (though that one is more difficult to tease out as we will discuss below). However, we should keep in mind that correlation between two events does not guaranty that one event causes the other, i.e., correlation does not guaranty causation. A correlation, simply means that there is a co-relation between X and Y. That is that X and Y behave in a way (e.g., linearly) that a systematic change of X appears to be followed by systematic change of Y. As plenty of examples have shown (e.g., see Tyler Vigen’s website spurious correlations) correlation between two events (X and Y) does not mean that one of them causes the other. They may really not have anything to do with each other. It simply means they co-relate to each other in a way that allow us to infer that a given change in one relates to a given change in the other. Our personal correlation detector, the one between our ears, will quickly infer that X causes Y, after it has establish a co-relation between the two.

Too tease out causation (i.e., action X causes outcome Y) in a statistical meaningful way we need to conduct an experimental design, making appropriate use of randomized setup. It is not at all rare to observe correlations between events that we know are independent and/or have nothing to do with each other (i.e., spurious correlation). Likewise it is also possible having events that are causally dependent while observing a very small or no apparent correlation, i.e., corr(X,Y) ≈ 0, within the data sampled. Such a situation could make us conclude wrongly that they have nothing to do with each other.

Correlation is a mathematical relationship that co-relates the change of one event variable ∆X with the proportional change of another event ∆Y = α ∆X. The degree of correlation between the events X and Y we can define as

with the first part (after the equal sign) being the general definition of a correlation between two random variables. The second part is specific to measurements (samples) related to the two events X and Y. If the sampled data does not exhibit a systematic proportional change of one variable as the other changes the corr(X,Y) will be very small and close to zero. For selective or small data samples, it is not uncommon to find the correlation between two events, where one causes the other, to be close to zero and thus “falsely” conclude that there is no correlation. Likewise, for selective or small data samples spurious correlations may also occur between two events, where no causal relationship exist. Thus, we may conclude that the is a co-relation between the events and subsequently we may also “falsely” believe that there is a causal relationship. It is straightforward to get a feeling for these cautionary issues by simulation using R or Python.

The central limit theorem (CLT among friends) ensures that irrespective of distribution type, as long as the sample size is sufficiently big (e.g., >30) sample statistics (e.g., mean, variance, correlation, …) will tend to be normally distributed. Sample variance of the statistic narrows as the sample size increases. Thus for very large samples, the sample statistic converges to the true statistic (of the population). For independent events the correlation between those events will be zero (i.e., the definition of independent events). CLT tells us that the sample correlations between the independent random events will take the shape of a standardized normal distribution. Thus, there will be a non-zero chance that a sample correlation is different from zero violating our expectation for two independent events. As said, our intuition (and math) should tell us that as the sample sizes increase, the sample variance should narrow increasingly around zero which is our true expectation for the correlation of independent events. Thus, as the size growths, the spread of sampled correlations, that is the spurious non-zero correlation reduces to zero, as expected for a database which have been populated by sampling independent random variables. So all seem good and proper.

As more and more data are being sampled, representing diverse events or outcomes, and added to our big data storage (or database), finding spurious correlations in otherwise independent data will increase. Of course there may be legitimate (causal) correlations in such a database as well. But the point is, that there may also be many spurious correlations, of obvious or much less obvious non-sensical nature, leading to data-driven decisions without legitimate basis in the data used. The range (i.e., max – min) of the statistics (e.g., correlation between two data sets in our data store) will in general increase as the amount of data sets increases. If you have a data set with data of 1000 different events, then you have almost half a million correlation combinations to trawl through in the hunt for “meaningful” correlations in your database. Searching (brute force) for correlations in a database with million different events would result in half a trillion correlation combinations (i.e., approximately half the size of number of data sets squared for large data bases). Heuristically, you will have a much bigger chance of finding a spurious correlation than a true correlation in a big-data database.

Does decision outcome matter?

But does it all matter? If a spurious correlation is persistent and sustainable, albeit likely non-sensical (e.g., correlation between storks and babies born), a model based on such a correlation may still be a reasonable predictor for the problem at hand and be maybe of (some) value … However, would we bet your own company’s fortune and future on spurious non-sensical correlation (e.g., there are more guarantied ways of having a baby than waiting for the stork to bring it along). Would we like decision makers to impose policy upon society based on such conjecture and arbitrary inference … I do not think so … That is, if we are aware and have a say in such.

In the example above, I have mapped out how a data driven decision process could look like (yes, complex but I could make it even more so). The process consist of 6 states (i.e., Idea, Data Gathering, Insights, Consultation, Decision, Stop) and actions that takes us from one state to the other (e.g., Consult → Decision), until the final decision point where we may decide to continue, develop further or terminate. We can associate our actions with likelihood (e.g., based on empirical evidence) of a given state transition (e.g.., Insights → Consult vs Insights → Decision, …) occurs. Typically, actions are not symmetric, in the sense that the likelihood of going from action 1 to action 2 may not be the same as going from action 2 back to action 1. In the above decision process illustration, one would get that for many decision iterations (or over time) we would find ourselves to terminate an idea (or product) ca. 25% of the time, even though the individual transition, Decision → Stop, is associated with a 5% probability. Although, this blog is not about “Markov decision processes” one could associate reward units (i.e., can be negative or zero as well) to each process transition and optimize for the best decision subject to the reward or cost known to the corporation.

Though, let us also be real about our corporate decisions. Most decisions tend to fairly incremental. Usually, our corporate decisions are reactions to micro-trends or relative smaller business environmental changes. Our decision making and subsequent reactions to such, more often than not, are in nature incremental. It does not mean that we, over time, cannot be “fooled” by spurious effects, or by drift in the assumed correlations, that may eventually lead to substantially negative events.

The survey.

In order to survey the state of corporate decision making in general and as it related to data-driven decision making, I conducted a paid surveymonkey.com survey, “Corporate data-driven decision making and the role of Artificial Intelligence in the decision making process”. A total of 400+ responses were collected across all regions of the United States with census for balancing gender and age (between 18 – 70) with an imposed annual household income at US$100k per annum. 70% of the participants holds a college degree or more, 54% of the participants describes their current job level as middle management or higher. The average age of the participants were 42 years of age. Moreover, I also surveyed my LinkedIn.com network as well as my Slack.com network associated with Data Science Master of Science studies at Colorado University, Boulder. In the following, I only present the outcome of the survey based on the surveymonkey.com’s paid survey as this has been sampled in a statistically representative way based on USA census and within the boundaries described above.

Basic insight into decision making.

Just to get it out of the way, a little more than 80% of the respondents believe that gender does not play a role in corporate decision making. Though it also means that a bit less than 20% to believe that men and women either better or worse in making decisions. 11% of the respondents (3 out of 4 women) believes that women are better corporate decision makers. Only 5% (ca. 3 out of 5) believes that men are better at making decisions. An interesting follow research would be looking at decision making under stressed conditions. Though, this was not a focus in my questionnaire.

Almost 90% of the respondent where either okay, enjoy or love making decision related to their business. A bit more than 10% do not enjoy making decisions. There are minor gender difference in the degree of appreciation for decision making but statistically difficult to say whether such are significant or not.

When asked to characterize their decision making skill in comparison with their peers, about 55% acknowledge they are about the same as their peers. What is interesting (but not at all surprising) is that almost 40% believes that they are better in making decisions than their peers. I deliberately asked to judge the decision abilities as “About the same” rather than average but clearly did not avoid the so-called better-than-average effect often quoted in social judgement studies. What this means for the soundness of decision making in general, I will leave for you to consider.

Looking at gender differences in self-enhancement compared to their peers. There are significantly more males believing they are better than their peers than is the case for female respondents. While for both genders 5% believe that they are worse than they peers in making decisions.

Having the previous question in mind, lets attempt to understand how often we consult with others (our peers) before making a business or corporate decision. A bit more than 40% of the respondents frequently consults with others prior to their decision making. In the survey frequently has been defined as 7 out of 10 times or higher. Similarly a bit more than 40% would consult others in about half of their corporate decisions. It may seem a high share that do not seek consultation on half of their business decisions (i.e., glass half empty). But keep in mind we also do make a lot of uncritical corporate decisions that is part of our job description and might not be important enough to bother our colleagues or peers with (i.e., glass half full). Follow up research should explore the consultation of critical business decisions more carefully.

The gender perspective on consulting peers or colleagues before a decision-making moment seem to indicate that men more frequently (statistically significant) seek such consultation than women.

For many of us, out gut-feel plays a role in our decision-making. We feel a certain way about the decision we are about to make. Indeed for 60% of the respondents their gut-feeling was important in 50% or more of their corporate decisions. And about 40% of the respondents was of the opinion that their gut-feel was better than their peers (note: these are not the same ca. 40% believing that they are better decision makers than their peers). When it comes to gut-feeling, its use in decision making and its relative quality compared to peers there is no statistical significant gender difference.

The state of data-driven decision making.

How often is relevant data available to your company or workplace for your decision making?

And when such data is available for your decision-making process how often are you actually making use of it? In other words, how data-driven is your company or workplace?

How would you assess the structure of the available data?

and what about its quality?

Are you having any statistical checks done on your data, assumptions or decision proposals prior to executing a given data-driven decision?

I guess the above outcome might be somewhat disappointing if you are a strong believer in the Mathematical Corporation with only 45% of respondents frequently applying more rigorous checks on the validation of their decision prior to executing them.

My perspective is a bit that if you are a data-driven person or company, assessing the statistical validity of the data used, assumptions made and decision options, would be a good best practice to perfect. However, also not all decisions, the even data-driven ones, may be important enough (in terms of any generic risk exposure) to worry about statistical validity. Even if the data used for a decision are of statistical problematic nature and thus may add additional risk to or reduce the quality of a given decision, the outcome of your decision may still be okay albeit not the best that could have been. And even a decision made on rubbish data have a certain chance of being right or even good.

And even if you have a great data-driven corporate decision process, how often do we listen to our colleagues opinion and also consider that in our decision making?

For 48% of the respondents, the human insight or opinion, is very important in the decision making process. About 20% deem the human opinion of some importance.

Within the statistical significance and margin of error of the survey, there does not seem to be any gender differences in the responses related to the data-driven foundation and decision making.

The role of AI in data-driven decision making.

Out of the 400+ respondents 31 (i.e., less than 8%) had not heard of Artificial Intelligence (AI) prior to the survey. In the following, only respondents who confirmed to have heard about AI previously will be asked question related to AI’s role in data-driven decision-making. It should be pointed out that this survey does not explore what the respondent understand an artificial intelligence or AI is.

As have been consistent since I started tracking peoples sentiment towards AI in 2017, more women than men appears to have a more negative sentiment towards AI than men. Men, on the other hand, are significantly more positive towards AI than women. The AI sentiment haven’t changed significantly over the last 4 years. Maybe slightly less positive sentiment and a more neutral positioning in the respondents.

Women appear to judge a decision-making optimized AI to be slightly less important for their company’s decision making process. However, I do not have sufficient data to resolve this difference to a satisfactory level of confidence. Though if present may not be surprising due to women’s less positive sentiment towards AI in general.

In a previous blog (“Trust thou AI?”), I described in detail the Human trust dynamic towards technology in general and cognitive systems in particular such as machine learning applications and the concept of artificial intelligence. Over the years the trust in decisions based on AI, which per definition would be data-driven decisions, have been consistently skewed toward distrust rather than trust.

Bias

Bias is everywhere. It is part of life, of being human as well as most things touched by humans. We humans have so many systematic biases (my favorites are: availability bias I see pretty much every day, confirmation bias and framing bias … yours?) that leads us astray from objective rationality, judgement and good decisions. Most of these so-called cognitive biases we are not even aware off, as they work on an instinctive level, particular when decision makers are under stress or time constraints in their corporate decision making. My approach to bias is that it is unavoidable but can be minimized and often compensated, as long as we are aware of it and its manifestations.

In statistics, Bias is relative easy to define and compute

Simply said, the bias of an estimated value (i.e., statistic) is the expected value of the estimator minus the true value of the parameter value being estimated. For an unbiased estimator, the bias is zero (obviously). One can also relate the mean square error minus the variance of the estimator to bias. Clearly, translating human biases to mathematics is a very challenging task if at all possible. Mathematics can help us some of the way (sometimes) but it is also not the solution to all issues around data-driven and human driven decision making.

Bias can be (and more often than not, will be) present in data that is either directly or indirectly generated by humans. Bias can be introduced in the measurement process as well as in data selection and post-processing. Then, in the modelling or analytic phase via human assumptions and choices we make. The final decision-making stage, that we can consider as the decision-thinking stage, the outcome of the data-driven process, comes together with the human interpretation & opinion part. This final stage also includes our business context (e.g., corporate strategy & policies, market, financials, competition, etc..) as well as our colleagues and managers opinions and approvals.

41% of the respondents do believe that biased data is a concern for their corporate decision making. Given how much public debate there has been around bias data and it’s impact on public as well as private policy, it is good to see that a majority of the respondents recognize the concern of biased data in a data-driven decision making process. If we attribute “I don’t know” response to uncertainty and this leads to questioning of bias in data used for corporate decision making, then all is even better. This all said, I do find 31% having no concerns about biased data, a relative high number. It is somewhat concerning, particular for decision makers involved in critical social policy or business decision making.

More women (19%) than men (9%) chose the “I don’t know” response to the above question. It may explain, why fewer women have chosen the ‘Yes’ on “biased data is a concern for decision making” giving maybe the more honest answer of “I don’t know”. This is obviously speculation and might actually deserve a follow up.

As discussed above, not only should the possibility for biased data be a concern to our data-driven decision making. Also the tools we are using for data selection and post-processing may be sources that introduces biases. Either directly, introduced by algorithms used for selection and/or post-processing or indirectly in the human choices made and introduced assumptions to the selected models or analytic frameworks used (e.g., parametrization, algorithmic recipe, etc..).

On the question “Is biased tools a concern for your corporate decision making?” the answer are almost too nicely distributed across the 3 possibilities (“Yes”, “No” and “I don’t know”). Which might indicate that respondents actually do not seem to have a real preference or opinion. Though, more should have ended up in “I don’t know” if really the case. It is a more difficult technical question and may require more thinking (or expert knowledge) to answer. It is also a topic that have been less prominently discussed in media and articles. Though the danger with tooling is of course that they are used as black boxes for extracting insights without the decision maker appreciating possible limitations of such tools.

There seem to be a slight gender difference in the response. However, the differences internally to the question as well as to the previous question around “biased data” is statistically non-conclusive.

After considering the possibility of biased data and biased tooling, it is time for some self-reflection on how biased do we think we are ourselves and compare that to our opinion about our colleagues’ bias in the decision making.

Almost 70% of the respondents, in this survey, are aware that they are biased in their decision making. The remainder either see themselves as being unbiased in their decision making (19%, maybe a bias in itself … blind spot?) or that bias does not matter (11%) in their decision making.

Looking at our colleagues, we do attribute a higher degree of bias to their decisions than our own. The 80% of the respondents think that their colleagues a biased in their decision making. 24% believe that their colleagues are frequently biased in their decisions as opposed to 15% of the respondents in their own decisions. Not surprisingly, we are also less inclined to believe that our colleagues are unbiased in their decisions compared to ourselves.

While there are no apparent gender differences in how the two bias question’s answers are distributed, there is a difference in how we perceive bias for ourselves and for our colleagues. We may tend to see ourselves as less biased than our colleagues. As observed with more respondents believing that “I am not biased at all in my decisions” compared to their colleagues (19% vs 12%) and perceiving their colleagues as frequently being biased in their decisions compared to themselves (24% vs 15%). While causation is super difficult to establish in such survey’s as this one, I do dare speculate that one of the reasons we don’t consult our colleagues on a high amount of corporate decisions may be the somewhat self-inflated image of ourselves being better at making decisions and being less biased than our colleagues.

Thoughts at the end

We may more and more have the technical and scientific foundation for supporting real data-driven decision making. It is clear that more and more data are becoming available to decision makers. As data stores, or data bases, grows geometrically in size and possibly in complexity as well, the human decision maker is either forced to ignore most of the available data or allow insights for the decision-making process to be increasingly provided by algorithms. What is important in the data-driven decision process, is that we are aware that it does not give us a guaranty that decisions made are better than decision that are more human-driven. There are many insertion points in a data-driven decision making process where bias can be introduced with or without the human being directly responsible.

And for many of our decisions, the amount of data available to our most important corporate decisions are either small-data, rare-data or not available. More than 60% of the respondents characterize the data quality they work with in their decision-making process of being Good (i.e., defined as uncertain, directionally ok with some bias nu of limited availability), Poor or Very Poor. About 45% of the respondents states that data is available of 50% or less of their corporate decisions. Moreover, when data is available a bit more than 40% of the corporate decision makers are using it in 50% or less of their corporate decisions.

Compared to the survey 4 years ago, this time around the respondents perception of bias in the decision making process was introduced. About 40% was concerned about having biased data influencing their data-driven decision. Ca. 30% had no concern towards biased data. Asked about biased tooling, only about 35% stated that they were concerned for their corporate decisions.

Of course, bias is not only limited to data and tooling but also to ourselves and our colleagues. When asked for a self-assessment of how biased the respondent believes to be in the corporate decision-making, a bit more than 30% either did not believe themselves to be biased or that bias does not matter for their decisions. Ca. 15% stated that they were frequently biased in their decision making. So of course we often are not the only decision makers around, our colleagues are as well. 24% of the respondents believed that their colleagues were frequently biased in their decisions. Moreover, for our colleagues, 21% (vs 30% in self-assessment) believe that their colleagues are either not biased at all or that bias does not matter for their decisions. Maybe not too surprising when respondents very rarely would self-assess to be worse decision makers than their peers.

Acknowledgement.

I greatly acknowledge my wife Eva Varadi for her support, patience and understanding during the creative process of writing this Blog. Also many of my Deutsche Telekom AG, T-Mobile NL & Industry colleagues in general have in countless of ways contributed to my thinking and ideas leading to this little Blog. Thank you!

Readings

Kim Kyllesbech Larsen, “On the acceptance of artificial intelligence in corporate decision making – A survey”, AIStrategyBlog.com (November 2017). Very similar survey to the one presented here.

Kim Kyllesbech Larsen, “Trust thou AI?”, AIStrategyBlog.com (December 2018).

Nassim Taleb, “Fooled by randomness: the hidden role of chance in life and in the markets”, Penguin books, (2007). It is a classic. Although, to be honest, my first read of the book left me with a less than positive impression of the author (irritating arrogant p****). In subsequent reads, I have been a lot more appreciative of Nassim’s ideas and thoughts on the subject of being fooled by randomness.

Josh Sullivan & Angela Zutavern, “The Mathematical Corporation”, PublickAffairs (2017). I still haven’t made up my mind whether this book describes a Orwellian corporate dystopia or paradise. I am unconvinced that having more scientists and mathematicians in a business (assuming you can find them and convince them to join your business) would necessarily be of great value. But then again, I do believe very much in diversity.

Ben Cooper, “Poxy models and rash decisions”, PNAS vol. 103, no. 33 (August 2006).

Michael Palmer. “Data is the new oil”, ana.blogs.com (November 2006). I think anyone who uses “Data is the new oil” should at least read Michael’s blog and understand what he is really saying.

Michael Kershner, “Data Isn’t the new oil – Time is”, Forbes.com (July 2021).

J.E. Korteling, A.-M. Brouwer and A. Toet, “A neural network framework for cognitive bias”, Front. Psychol., (September 2018).

Chris Anderson, “The end of theory: the data deluge makes the scientific method obsolete“, Wired.com (June 2008). To be honest when I read this article the first time, I was just shocked by the alleged naivety. Although, I later have come to understand that Chris Anderson meant his article as a provocation. “Satire” is often lost in translation and in the written word. Nevertheless, the “Correlation is enough” or “Causality is dead” philosophy remains strong.

Christian S. Calude & Giuseppe Longo, “The deluge of spurious correlations in big data”, Foundations of Science, Vol. 22, no. 3 (2017). More down my alley of thinking and while the math may be somewhat “long-haired”, it is easy to simulate in R (or Python) and convince yourself that Chris Anderson’s ideas should not be taken at face value.

“Beware spurious correlations”, Harvard Business Review (June 2015). See also Tyler Vigen’s book “Spurious correlations – correlations does not equal causation”, Hachette books (2015). Just for fun and provides amble material for cocktail parties.

David Ritter, “When to act on a correlation, and when not to”, Harvard Business Review (March 2014). A good business view on when correlations may be useful and when not.

Christopher S. Penn, “Can causation exist without correlation? Yes!”, (August 2018).

Therese Huston, “How women decide”, Mariner books (2017). See also Kathy Caprino’s “How Decision-Making is different between men and women and why it matters in business”, Forbes.com (2016). Based on interview with Therese Huston. There is a lot of interesting scientific research indicating that there are gender differences in how men and women make decisions when exposed to considerable risks or stress, overall there is no evidence that one gender is superior to the other. Though, I do know who I prefer managing my investment portfolio (and its not a man).

Lee Roy Beach and Terry Connolly, “The psychology of decision making“, Sage publications, (2005).

Young-Hoon Kim, Heewon Kwon and Chi-Yue Chiu, “The better-than-average effect is observed because “Average” is often construed as below-median ability”, Front. Psychol. (June 2017).

Aaron Robertson, “Fundamentals of Ramsey Theory“, CRC Press (2021). Ramsey theory accounts for emergence of spurious (random) patterns and correlations in sufficiently large structures, e.g., big data stores or data bases. spurious patterns and correlations that appear significant and meaningful without actually being so. It is easy to simulate that this the case. The math is a bit more involved although quiet intuitive. If you are not interested in the foundational stuff simply read Calude & Longo’s article (referenced above).

It is hard to find easy to read (i.e., non-technical) text books on Markov chains and Markov Decision Processes (MDP). They tend to adhere to people with a solid mathematical or computer science background. I do recommend the following Youtube videos; on Markov Chains in particular I recommend Normalized Nerd‘s lectures (super well done and easy to grasp, respect!). I recommend to have a Python notebook on the side and build up the lectures there. On Markov Decision Processes, I found Stanford CS221 Youtube lecture by Dorsa Sadigh reasonable passable. Though, you would need to have a good grasp of Markov chains in general. Again running coding in parallel with lectures is recommendable to get hands on feel for the topic as well. After those efforts, you should get going on re-enforcement learning (RL) applications as those can almost all be formulated as MDPs.

A hands-on guide to data-driven business decisions – How to gain insights into an uncertain world.

“Would you like an adventure now or shall we have tea first”, Alice in Wonderland (Lewis Carroll).

Google parent company Alphabet breached a US$2 (~ €1.8) trillion valuation in November this year (2021). What few realize is that this valuation, as well as the success of Google, is based on a Markov Decision Processes and its underlying Markov chain. Making PageRank, the brain child of Sergey Brin and Larry Page, likely the most valuable and most used applications of a Markov chain ever, period. We all make use of a Markov Chain and a Markov Decision Process pretty much every day, many many times a day. The pages of the world wide web, with its 1.5+ billion indexed pages, can be designated states of a humongous huge Markov chain. And the in excess of 150+ billion hyperlinks, between those web pages, can be seen as the equivalent of Markov state transitions, taken us from one State (web page) to another State (another web page). The Markov Decision Process value and policy iteration ensures that the consumers of Google’s search engine gets the best search results in the fastest possible way. Many hyperlink paths may lead to your page of interest, but many (maybe most) of those paths will not get you where you want to be fastest. Optimization algorithms develop on the world-wide web Markov chain will find the best path to the content you want (or the algorithm “thinks” you want). I cannot think of a better motivating example for business people to get acquainted with data-driven decision processes and the underlying Markov chains than that.

In the course of my analysis of corporate decision makers sentiment towards data-driven decision making and processes, it became clear that there is very little comprehensive material (for business people) on structured hands-on approaches to decision processes. How do you go around implementing a data-driven decision process in your company? and this despite, such approaches can be fantastic tools for structuring and optimizing a business decision making processes. Not only that. We can integrate such decision processes algorithms into our digital platforms. Achieving a very high degree of autonomous business decisions (e.g., Google’s PageRank decision process), that in turn will enhancing our customer’s experience. Also, it allows us to monitor the overall quality of customer interactions with our digital environment(s) (e.g., web-based environments, user-interfaces, closed-loop user-experience, digital sales, apps, bots, etc..). Such data-driven integration would result in more efficient business processes internally and towards external customers and partners.

So, my goal, with this blog, is to take you through the essentials, the enablers, to better understand what a decision process may look like and its wide applicability towards business-relevant decision making.

I like hands-on. The focus of this blog will be on providing coding examples for the reader to implement and in parallel play with the examples given. Maybe even better encourage you to create your own examples, relevant to your area of business interest. I will take you through two major important enablers for digitizing data-driven decision processes. One enabler is so-called Markov Chains which is essential in understanding the other, Markov Decision Processes (MDPs). I will attempt to do this intuitively, via coding examples, rather than write down a lot of mathematical notation (which is the normal approach). I will demonstrate some relative simple business examples that hopefully will create an appetite to learn much more about this field.

At the very end of this blog, you can either stop reading or you can continue (an MDP maybe?) and will then find a more formalistic treatment of Markov Chains and MDPs. Although somewhat more “long haired” (coming from a bald guy), I attempted to make it assessable to readers that may not have a math degree.

In my previous blog, “Data-driven decision making … what’s not to like about that?”, I wrote about data-driven decision making for businesses and public institutions. I described, or more accurately pulled a so-called Markov Decision Process out of my magic hat, that we could view as a data-driven decision process,

In the example above, I mapped out how a data-driven decision process might (or should) look like. The process consist of 6 states or stages (i.e., Idea, Data Gathering, Insights, Consultation, Decision, Stop) and actions that takes us from one state to the other (e.g., Consult → Decision), until the final Decision state, where we may decide to continue, develop further, thus back to the Idea, or decide to terminate (i.e., Stop). We can associate our transitions and the associated actions with likelihoods, based on empirical evidence that fancy people may describe as process mining, of a given state transition (e.g.., Insights → Consult vs Insights → Decision, …) to occur.

The described decision process is not static. It is dynamic and is likely to evolve over time. Transitions from one state to another can be seen as moving forward in time increments. You find yourself in one stage of the process and making up your mind, that is you make a decision or take an action, what next state you “want” to move to. With a given likelihood of making such a decision if there are several stages to move to and depending on the level of stochasticity. The rules of how to move from one state to the next is given by the scenario you are considering.

The above illustration may (or may not) look complex. So let’s break down on a higher level what we see above. We have Circles that represent a Stage in the decision process, e.g., Idea, Data gathering Insights, Consult, Decision, Stop. These stages are what we also could call States which is the official name if we speak Markovian. I will use stage and state interchangeably. In between the stages we have white colored arrows that takes us, or Transition in Markovian, from one stage to the next (e.g., from Insights to Consult). A transition from one stage to the next is triggered by an Action (e.g., a decision) in the stage we are in that leads us to the following stage in our process. For each transition and associated action we can assign a Transition Probability that describes the likelihood of a particular transition & action combination to take place. For example, in the above chart, I have a transition probability from Insights stage to the Consult stage of 40%. I can also associate a reward R associated with a given transition. This reward is “banked” as you enter the new stage. For example, I can expect a reward of €R23 (see illustration above), as I transition from Insights to Consult stage. An arrow that goes back to its own stage simply mean that a decision, or action, is taken that results in us remaining in that particular state (e.g., 20% chance of remaining in the Decision stage which here implies that we continue as is). As transitions are associated with probabilities, it follows that sum of transitions probabilities (arrows) leaving a state is required to be equal to 1. It is possible to define several transition probability scenarios per state. Such scenarios is called policies in the language of Markov. The idea is that we can run through several scenarios, or policies, to determine if some are better than others and thus optimize our decision making process (assuming the optimal policy is also feasible). Typically, actions, (state) transitions, and associated rewards will not be are not symmetric. In the sense, that the likelihood (& reward) of going from State 1 to State 2 may not be the same as going from State 2 back to State 1.

What I have describe above is the fundamental setup of a so-called Markov Chains and how such can be extended to action (e.g., decision), rewards (e.g., income & cost) and policy (i.e., scenario) estimation and optimization. Thus, into what we call Markov Decision Processes (MDP) or its “cousins” Markov Reward Processes (MRP).

The question remains, of course, how do we actually do some meaningful analysis on a decision process as illustrated above? How do we code this?

While you will find some examples in the public domain on analysis of Markov Chains, it becomes a bit more technical as you move up the “intellectual food chain” to Markov Decision Processes. I have provided some simple, but generalized, Python codes throughout this blog. These will allow you to run some of this analysis yourself and gain a lot of insights into Markov chains and (Markov) Decision Processes.

A simpler customer life-cycle & retention example.

Let’s take a simple example of customer life-cycle of a typical subscription (e.g., online, magazine, services, …). We start our process with the Conversion of a prospective customer to a real customer, kicking of the customer life-cycle. After the sale, the customer starts the service which below is defined as Retention, we want to keep our customer with that service. During the retention period, or life as a customer, the illustration below assumes 3 events may happen; (1) Our Customer is okay with service and wish to continue as is (i.e., remains in retention stage), (2) Our Customer is interested to add additional features to the existing subscription, thus accepting an Upsell after which the customer falls back into the retention phase, and finally (3) Our Customer may decide to discontinue the subscribed service. Thus, Churn and end the engagement and customer life-cycle process. In fact, once the churn state has been reached, it cannot be left. We call such a state and absorbing state. A Markov chain that includes absorbing states (or at least one) is called an absorbing Markov chain.

In the above customer life-cycle illustration, we have 4 states (or stages); Conversion (S0), Retention (S1), Upsell (S2) and Churn (S3). The transition from one stage to the next (e.g., Retention → Upsell) is driven by a decision (i.e., action) to do so. Each transition is associated with a likelihood of that particular decision is being made, e.g., there is 20% that a customer decides to accept an Upsell when located in the Retention state.

In a Python code we can operationalize the process as follows;

# Customer life-cycle process (simple)
# Define States


states = {
    0 : 'Conversion',
    1 : 'Retention',
    2 : 'Upsell',
    3 : 'Churn'
}


c = 0.05      # Churn likelihood
u = 0.20      # Upsell likelihood


#Transition probability matrix:
T = np.array([
    [0.00, 1.00, 0.00, 0.00],
    [0.00, 1-u-c, u, c],
    [0.00, 1.00, 0.00, 0.00],
    [0.00, 0.00, 0.00, 1.00]])

Each row (and column) in the transition probability matrix represents the stages in the process. Note that each row is required to sum up to 1. Our curiosity of the dynamics of our process, our chain of events, should lead us to ask “In the long run, what are the chance of ending up in each of the defined states of our process?”. Actually, for our customer life-cycle process, we might like understand what is our long-term churn proportion given the current scenario. Maybe even trade-offs between upsell and churn rate.

How do get about that question solved?

Our process have reached steady-state when the likelihood of ending up in a given state does not change any longer with subsequent evolution of time. What this means is that for a given overall process state, represented by π, applying the transition matrix no longer changes that overall process state (i.e., πT = π). Thus, π then represents the expected likelihood of being in a given state once steady-state has been reached.

In our illustration above, we can kick off our process with π0 = [1, 0, 0, 0] which represents that our initial stage is in the Conversion state. Applying the transition matrix T to our initial state will transition it into the Retention stage, i.e., π0T = [0, 1, 0, 0] = π1. The next step then is π1T = [0, 0.79, 0.20, 0.01] = π2 and so forth until πT = π for all subsequent time steps (algorithmically we can describe this iterative process as πT ← π). Following this recipe we can create a small Python code that will do the work for us;

def steady_state(pi, T, epsilon=0.01):

# MARKOV CHAIN STEADY STATE.
# pi      : Given n states pi is an array of dim (n,) .
# T       : The transition probability matrix of dim (n,n)
# epsilon : Provides the convergence criteria.


    j = 0 #Counter
    
    while True:
        
        oldpi = pi
        
        pi = pi.dot(T)
       
        j+=1


        # Check Convergence
        if np.max(np.abs(pi - oldpi)) <= epsilon:
            break
        # In case of no Convergence
        if j == 1000:
            break
            
    return pi # Returning the likelihood of the steady-state states.

Using the above code, the transition matrix given above and our customer life-cycle process initial condition π0 = [1, 0, 0, 0], we find that

# Finding Customer life-cycle Steady-State
pi0 = np.array([1, 0, 0, 0])


pi = steady_state(pi0, T)  # Using above steady-state function


print(np.round(pi,2))

output>> steady-state pi = [0, 0.19, 0.04, 0.77]  

So, within the existing customer life-cycle scenario a Churn rate of 5% will in the long run lead to an overall churn likelihood of almost 80%. This is with a Retention transition probability of of 75% and an Upsell transition probability of 20%. It may be a surprising outcome, that a relative low probability action (or event) can lead to such a dominant business impact. Intuitively, we should remember that churn is a terminal event. Retention and even Upsell simply continue to operate within the life-cycle process with the ever present “doom” of a customer leaving by churning. More detailed analysis of this process will show that as long as we keep the Churn transition probability below 1.3% the Churn State’s steady-state likelihood to transition probability is below 3.3 times that of the transition probability. Above 1.3% transition probability, the steady-state Churn likelihood rapidly increase to10 – 20 times that of the base transition probability. Needless to say, the current policy, as represented by the transition matrix, would require optimization in order to minimize the churn impact and ideally we would need to aim for measures keeping churn below 1.3%.

The purpose of this blog, as stated in the beginning, is not so much studying the dynamics of a hypothetical customer life-cycle management processes or any other for that matter. The purpose of this blog is to provide you with an understanding of decision process modelling, the analysis possible, and tools to go and do that yourself.

A simple customer life-cycle example with rewards.

How do we get from the process dynamics to an assessment of value of our given strategy or policy? There are several dimensions to this question. Firstly, we would of course like to enhance the value of the overall decision process. Next, we also would like to ensure that each stage of our process has been optimized as well, in terms of value, with value being monetary, time-wise, number of stages, order of states, etc..

I will provide you with a reasonably simple approach as well as the code to work out your own examples. It is in general straightforward to add complexity to your process and still be able to use the provided code.

In the above illustration two different scenarios have been provided. Both scenarios are represented by the same transition probability matrix. This is not a requirement but makes the example simpler. Policy 1 mimics an “annual subscription model” with a subscription value of 100 per annum, an upsell value of 20 per annum and a (1-time) loss of value by churn of -100. Policy 2 represents a “monthly subscription model” with a subscription value of 10 per month, an upsell value of 2 per month and a (1-time) loss of value by churn of -60. I would like to know what the overall expected value is for each of the two scenarios.

However, Houston we (may) have a problem here. Remember that once we end up in the “Churn” state we are stuck in that state (i.e., we don’t “un-churn” … at least not in this process). From a transition probability matrix perspective once we are in the churn state every new iteration transition back to the same state. In fact, as already discussed previously, the churn state acts as an absorbing state, making our customer life-cycle Markov chain an absorbing Markov Chain. In this particular case, we would like to assign a one-time penalty (i.e., negative reward) to the churn state and then end the process. Of course this may not always be the case. But, here it is the case. From a (state) transition matrix perspective we have created our own equivalent of “Groundhog day” as we keep returning to same state and thus might end up multiplying the churn penalty times the number of iterations it takes us to get to convergence of the overall system. Unless we are careful. We have two solutions (1) Choose an appropriate small churn penalty that with a given discount factor (lets call it γ) ensures that the penalty quickly converges to a reasonable figure (yeah … not the nicest solution, but it could work) and (2) Introduce an End-state with zero reward/penalty where the churn-state has a probability of 1 to end up in. Thus, pushing the absorbing state away from the churn-state which will be convenient for the valuation estimation process. This, will ensure that as we end up in the churn-state, its penalty is only counted once. This will be an important consideration as we commence on value iteration for a Markov Decision Process. So, our above illustration needs a bit of revision and get’s to look like this,

and our Python implementation of the states, transition matrix and reward vector will look like this,

# Customer life-cycle process (simple)
# Define States


states = {
    0 : 'Conversion',
    1 : 'Retention',
    2 : 'Upsell',
    3 : 'Churn',
    4 : 'End'
}


c = 0.05      # Churn likelihood
u = 0.20      # Upsell likelihood


#Transition probability matrix:
T = np.array([
    [0.00, 1.00, 0.00, 0.00, 0.00],
    [0.00, 1-u-c, u,    c,   0.00],
    [0.00, 1.00, 0.00, 0.00, 0.00],
    [0.00, 0.00, 0.00, 0.00, 1.00],    # Churn transition to End State
    [0.00, 0.00, 0.00, 0.00, 1.00]])   # End state

If you run the 5-state transition matrix through the above Python “steady_state()” function, you would get π = (0 0.18 0.04 0.01 0.77) where the last two-states (i.e., the Churn- & End-state) are the part of the same state, thus mapping it back to the 4-state model we have π = (0 0.18 0.04 0.78) which is very close to the above “true” 4-state model (i.e., with the churn-state also being the end-state) π = (0 0.19 0.04 0.01 0.77). From π we see that we have a steady-state likelihood of 77% ending up in the absorbing state in our customer life-cycle process.

For value estimation purposes, we would like to ignore the absorbing state and renormalize the steady-state vector π. Thus, π’ = (0 0.19 0.04 0.01 0)/sum((0 0.19 0.04 0.01 0)) which results in a renormalized steady-state vector π’ = (0, 0.79, 0.17, 0.04, 0) that we will use to asses the long-run average value of our customer life-cycle process.

So let’s do a bit of random walk across our decision process as depicted above. I will make use of repeated random sampling (also called a Monte Carlo simulation among friends) on my decision process. The transition matrix will bring me from one stage to another with a rate provided by the respective transition probability matric (T). The reward vector (R), see above illustration, provides the reward expectation for each state (e.g., R = [0, 100, 20, -100, 0] for Policy 1). As the repeated random sampling progresses and the various decision process states are visited, we add up the respective state’s rewards until we end up in the Churn state that will end the simulation. The simulation is then repeated and the average of all the simulated process values provides our expected overall value for the decision process. We have the option to discount future values with gamma (γ < 1);

with n* = min { N, n @ EndState} with N being the maximum allowed simulation time steps iterations and “n @ EndState” is the time step where our simulation end up in the EndState terminating the simulation. This principle is similar to financial net present value calculations that value the present more than the future at for example a weighted average cost of capital (WACC) or discount rate. Also it will turn out mathematically convenient to include a discount rate as it ensures that our value iteration converges (which is pretty convenient).

The Python code for the described simulation is provided here;

def mc_random_walk_reward(states, state0, T, R, gamma=0.9, tot_steps, EndState)

# EXPECTED POLICY & VALUE ASSESSMENT VIA RANDOM WALKS.
# states     : Array which includes the names of the n states being studied. 
# T          : The transition probability matrix of dim (n,n)
# R          : The reward array of dim (n,) e.g., can have several columns pending 
#              policies.
# gamma      : Value Discount Factor, < 1.0
# tot_ steps : Maximum number of time step iterations unless EndState is 
#              encounted.
# EndState   : The end-state that would stop the simulation unless steps is 
#              encounted.
    
    n = steps
    stp = 0
    start_state = state0
    path = [states[start_state]]
    value_state = R[state0]*(gamma**0)
    
    prev_state = start_state
    
    i = 1 
    while n:
        curr_state = np.random.choice(list(states.keys()), p = T[prev_state])
        value_state += R[curr_state]*(gamma**i)
        path+=[states[curr_state]]
        prev_state = curr_state
        i+=1
        n-=1
        if states[prev_state] == EndState: 
            stp = n + 1
            break

    return (path, value_state, stp)

The Transition matrix remain, thus we need to define the Reward vector;

# Reward vector for Policy 1 - Annual Subsctiption Model.

R1 = np.array([
    [0],
    [100],
    [20],
    [-100],
    [0])

# Reward vector for Policy 2 - Monthly Subsctiption Model.

R2 = np.array([
    [0],
    [10],
    [2],
    [-60],
    [0])

Thus, we are ready to call run the decision process valuation;

j = 0
stop = 10000 # Total amount of Monte Carlo simulations


df_value = []
df_time = []


# For R use R1 for Policy 1 and R2 for Policy 2.

while True:
    a,b,c = mc_random_walk_reward(states, 0, T, R, 0.90, 100, 'Churn')
    df_value.append(b)
    df_time.append(c)
    j+=1
    if j == stop:
        break

unit_time = 1 # 1 month in 1 time unit iteration.

print('Total Value of Policy 2: ', np.round(np.mean(df_value)/unit_time,0))
print('Mean time to Churn of Policy 2: ', np.round(np.mean(df_time),0))

output>> Expected Total Value of Policy 2:  42.0
output>> Average time to Churn of Policy 2:  77.0

The expected total value of Policy 2 comes out at ca. 42. Applying the above code to Policy 1, get’s us an expected total value of 46 and thus a bit more attractive than Policy 2. This provides a fairly easy way to get an idea about a given decision process value. As we enter the maximum amount of time step iteration steps (e.g., 100 in the code snippet) it is good to check that the average time to churn over the total amount of Monte Carlo simulations is less than this number (e.g., 77 < 100 in the above code snippet). It is wise to play a bit with the maximum steps number to check whether your expected total value of your policy changes significantly.

It should be noted that as the Monte Carlo simulation of the Markov Chain terminates upon Churn, the churn penalty is only accounted for once as is appropriate in this particular example. In other words, for this part we would not strictly speaking require the fifth state to end the process.

If we compare the expected value of our process of 42.0 (Policy 2) with the Value Iteration algorithmic approach (more on that below), used on Markov Decision Processes, we find steady-state state values of V[Policy 2] = [ 42.2, 46.9, 44.2, -60, 0] = [Conversion, Retention, Upsell, Churn, End]. With the End-state representing our absorbing state. Using our renormalized steady-state state π’ = (0.00, 0.79, 0.17, 0.04, 0.00), we find that our long-run average value is

G = V ∙ π’ = 42.2 ∙ 0.00 + 46.9 ∙ 0.79 + 44.2 ∙ 0.17 + (-60) ∙ 0.04 + 0 ∙ 0.00 = 42.2

Which is a pretty good agreement with the Monte Carlo simulations of our Customer Lifecycle Process. And even better … a much faster way of getting the long-run value. However, if you are dealing with an (or several) absorbing states, some caution in how to compensate for those should be considered. I like in general to run a Monte Carlo process simulation just to ensure that my value iteration extraction logic is correct.

Decision process optimization … Dual policy and value iteration.

We are often in situations where we need to evaluate different assumptions or policies in our decision process. Imagine you are looking at a subscription process, as shown below, where you have broken down the customer life-cycle in an 3 parts (i.e., states or stages); (1) Start of subscription, (2) Average life-time and (3) long-time subscriptions. You are contemplating on two different policies; Policy 0 (white arrows): No churn intervention and Policy 1 (orange arrows): Churn intervention measures at each stage in the subscription process. After a churn intervention, at a given state, your system will treat you as a new customer (note: this might not be the smartest thing to do, but it is the easiest to illustrate).

The decision process for such a customer life-cycle process is illustrated below. I have deliberately not added an end-state after churn in this example (i.e., so strictly speaking once we end up in the Churn state we are in our “Groundhog state” of “perpetual” churn … which is btw why we like to call it an absorbing state). It adds complexity to the Markov chain and the purpose of this example is to show the policy and value optimization in a situation where we have 2 policies to consider in our decision process.

You would like to know, what the value is at each state, given observed churn. Moreover, you also require to know when (time-wise) and where (state-wise) it might become important to shift from one policy to the other. So let’s (Python) code the above customer loyalty decision process;

# Customer life-cycle process - dual value and policy iteration
# Define States

# Subcription state

states = {
    0 : 'Start',
    1 : 'Average life-time ',
    2 : 'Long-time',
    3 : 'Churn'
}


#Transition probability matrix:


c = 0.05 # Churn rate; try to increase this with steps of 0.05
         # and you will see that value and policies begin to change as
         # the churn rate increases
    
    # Policy 0 - No churn intervention measures.
    policy_1 = np.array([
        [0.00, 1-c, 0.00, c],
        [0.00, 0.00, 1-c, c],
        [0.00, 0.00, 1-c, c],
        [0.00, 0.00, 0.00, 1.00]
        ])
    
    
    # Policy 1 - Churn Intervention measures.
    policy_2 = np.array([
        [1.00, 0.00, 0.00, 0.00],
        [1.00, 0.00, 0.00, 0.00],
        [1.00, 0.00, 0.00, 0.00],
        [0.00, 0.00, 0.00, 1.00]
        ])
    
    
    T =  np.array([policy_1, policy_2]) #Transition probability matrix
    
    
    # Reward array for the two policies
    # 1st column reflects rewards in Policy 0
    # 2nd column reflects rewards in Policy 1

    R = np.array([
        [0, 0],
        [0, 1],
        [4, 2],
        [-1, -1]])
    
    
    print(states)
    print(T)
    print(R)
  

In order for us to get the optimal decision process value and respective policy we are making use of an iterative computing algorithm called Value Iteration. The value iteration procedure provides the values of the various states of our decision process with known transition probabilities and their corresponding rewards. The same algorithm also provides for the optimal policy matching the optimum values (i.e., Policy Iteration). I have implemented the Value and Policy Iteration algorithm in the code below.

def mdp_valueIteration(states,T,R, gamma = 0.90, epsilon = 0.01):

# VALUE AND POLICY ITERATION
# states  : Array which includes the names of the n states being studied. 
# T       : The transition probability matrix of dim (n,n)
# R       : The reward array of dim (n,) e.g., can have several columns pending
#           the number of policies.
# gamma   : Value discount factor, < 1.0
# epsilon : Provides the convergence criteria.


    # Initialize V_0 to zero
    values = np.zeros(len(states))

    ctime = 0
    
    # Value iteration
    # Continue until convergence.

    while True:

        # To be used for convergence check
        oldValues = np.copy(values)

        values = np.transpose(R) + gamma*np.dot(T,values)   # Value iteration step
 
        policy = values.argmax(0)   # Take the best policy.
        values = values.max(0)      # Take the highest value
        
        ctime +=1
        
        # Check Convergence
        if np.max(np.abs(values - oldValues)) <= epsilon:
            break
         
    return(values, policy, ctime)        

All we have to do is to call the above “mdp_valueIteration” function with the transition probability matrix and respective reward vectors for Policy 0 and Policy 1.

# Call ValueIteration function and get optimum value per state and
# Optimum Policy strategy per state.

values, policy, time = mdp_valueIteration(states,T,R,gamma = 0.90)


print('Optimum State Values: ', np.round(values,0))
print('Optimum Policy      : ', policy)
print('Optimum Time steps  : ', time)

# Output for Churn rate c = 0.05

output>> Optimum State Values: [17  21  25  -10]
output>> Optimum Policy      : [ 0   0   0    0]

# Output for Churn rate c = 0.3

output>> Optimum State Values: [ 0   1   4  -10]
output>> Optimum Policy      : [ 1   1   0    0]

We have 4 states defined in our subscription decision process, “Start of subscription” (S0), “Average life-time subscription” (S1), “Long-time subscription” (S2) and “Churn”(S3). Not surprisingly we find that the highest value is delivered by the subscribers in the long-time subscription category. For relative low churn rates, the policy without churn intervention measures (i.e., Policy 0) is the most appropriate. As the churn rate increases, we observe that the best policy (in terms of value optimization between the two policies) for State 0 and for State 1, is to have a churn intervention policy (i.e., Policy 1) in place.

In summary, with a more detailed analysis of our dual-policy customer life-cycle decision process we find the following process dynamics as a function of the churn rate.

For comparison, I have also tabulated the value outcome in the case that we do not consider a churn mitigation policy.

Using the provided code and examples, it is straightforward to consider more complex decision process applications. The codes I have provide are very general and your work will be in defining your process underlying Markov chain with its transition probabilities and the respective decision process rewards (positives as well as negatives). Once that is setup, it referencing the provided code functions.

The hidden Markov model … An intro.

We are often in the situation that we get customer feedback (observable) without being complete sure what underlying (“hidden”) processes that may have influenced the customer to provide a particular feedback. A customer may interact in a particular way with our web store (again observable). That customer interaction is likely also to be influenced by hidden processes and behaviors of the underlying system dynamics (e.g., user interface architecture, front-end interactions, back-end interactions, etc…).

Let’s add some observable “happiness” and “unhappiness” to our customer retention and life-cycle process from the beginning of this blog.

We have already defined the transition probability matrix above. We also need to to write the observable sentiments associated with states into a matrix form as well (this is called the emission matrix in Markovian).

To visualize this a bit better we observe;

The following code function provides the likelihood of observing a sequence of observable states (e.g., Happy, Happy, Not Happy, Happy, Not Happy, Not Happy, …) with a given hidden Markov chain describing the underlying process that may influence the observable states.

def hmm_find_prob(seq, T ,E, pi):
# HIDDEN MARKOV CHAIN LIKELIHOOD OF A SEQUENCE OF OBSERVABLES
# seq  : The observable sequence.
# T    : Transition Probability matrix (hidden states).
# E    : Emission matrix (observable states).
# pi   : The steady-state of the hidden markov chain.


    start_state = seq[0]
    alpha = pi*E[:,start_state]


    for i in range(1, len(seq)):
        curr_state = seq[i]
        alpha = (alpha.dot(T))*E[:,curr_state]
        prob = sum(alpha)


    return prob

In order to estimate the likelihood of observable sentiment states (i.e., Happy and Not Happy) of our hidden Markov process (illustrated above) we need to code the matrices, identify the sentiment sequence we would like to access the likelihood of.

# Setting up the pre-requisites for a Hidden Markov Model


hidden_states = {
    0 : 'Conversion',
    1 : 'Retention',
    2 : 'Upsell',
    3 : 'Churn'
}


observable_states = {
    0: 'Happy',
    1: 'Sad'
}


# Note that even if the hidden and observable states are not directly used
# in this example, it is always good dicipline to write them out as they
# tie into both the transition and the emission matrix.


# Transition Probability Matrix (Hidden States)
T = np.array(
    [[0.00, 1.00, 0.00, 0.00],
     [0.00, 0.75, 0.20, 0.05],         # Note we assume 5% churn here!
     [0.00, 1.00, 0.00, 0.00],
     [0.00, 0.00, 0.00, 1.00]])


# Emission Matrix (Observable States)
E = np.array(
    [[1.0, 0.0],
     [0.7, 0.3],
     [0.8, 0.2],
     [0.0, 1.0]])


pi0 = np.array([1, 0, 0, 0])
pi = steady_state(pi0, T)          # Steady State of the underlying Markov chain.


seq = [0,0,0,0,0]       # Happy, Happy, Happy, Happy, Happy

print('Likelihood of having 5 consecutive happy experiences: ', np.round(100*hmm_find_prob(seq, T, E, pi),0), '%')


seq = [1,1,1,1,1]       # Not Happy, Not Happy, Not Happy, Not Happy, Not Happy

print('Likelihood of having 5 consecutive negative experiences: ', np.round(100*hmm_find_prob(seq, T, E, pi),0), '%')

output>> Likelihood of having 5 consecutive happy experiences:  4.0 %
output>> Likelihood of having 5 consecutive negative experiences:  78.0 %

We have to draw the conclusion that there is a much higher chance of having 5 consecutive negative experiences (78%) than 5 positive experiences (4%). This should not be too surprising as our steady state (long-term behavior) of the customer life-cycle process with 5% churn rate is 77%. As we assess that a customer in the churn state is 100% likely to be unhappy, it is clear that unhappiness would be the overwhelming expectations for the defined process. It should be noted that as an alternative, we could also run the decision process replacing monetary value with sentiment value and optimize our decision process for customer sentiment.

If I could lower my Churn rate, in my decision process, down to 1% (from 5%), increasing my Retention rate to 79% and keeping the Upsell rate at 20%, I would have 18% chance of 5 consecutive happy experiences and only 4% of 5 unhappy experiences.

It is not that difficult to see how these principles can be applied to many different settings in the digital domain interfacing with customers.

Wrapping it up.

In this blog, I have provided the reader with some insights into how to apply Markov Chains and Markov Decision Processes to important business applications. The Python code snippets you have met throughout this blog can directly be used in much more complex decision processes or deeper dives on Markov chains in general.

I have found it difficult to find reasonable comprehensive examples of how we can move from Markov chains (or models) to Markov Decision Processes as it applies to a data-driven business environment. You can find some good example on applied Markov Chains (see “For further study” below. I really recommend Normalized Nerd YouTube videos for an introduction). There is however very little material bridging the gap from Markov models to decision processes. While the math behind both Markov Chains and Markov Decision Processes are not very difficult, it is frequently presented in a way that makes it incomprehensible unless you have an advanced degree in mathematics. When you write it out for simpler Markov chains or decision processes you will see that it is not that difficult (I am still pondering on giving a few examples in this blog).

I recommend that you always run a couple of Monte Carlo simulations on your Markov chain and decision process. Comparing the outcome with the algorithmic approaches you might apply, such as steady-state state derivation, value and policy optimization, etc.. If your process contains absorbing states, be extra careful how that should be reflected in your overall process valuation and optimization. For example, if you consider churn in a customer lifecycle process, this would be an absorbing state and unless you are careful in your design on the underlying Markov chain, it may overshadow any other dynamics on your process. A Monte Carlo simulation might reveal such issues. Also, start simple, even if your target decision process may be very complex. This will allow you to understand and trust changes as you add complexity to your process.

I hope that my code snippets likewise will make this field more approachable. Anyway, if you want to deep dive into the math as well, you will find some good starting points in my literature list below.

“We should not fault an agent for not knowing something that matters, but only for having known something and then forgotten.”, The unremembered.

Acknowledgement.

I greatly acknowledge my wife Eva Varadi for her support, patience and understanding during the creative process of writing this Blog. Also many of my Deutsche Telekom AG, T-Mobile NL & Industry colleagues in general have in countless of ways contributed to my thinking and ideas leading to this little Blog. Thank you!

For further study.

Romain Hollander, “On the policy iteration algorithm for PageRank Optimization”, MIT Report (June 2010).

Kim Kyllesbech Larsen, “Data-driven decision making … what’s not to like about that?”, LinkedIn Article (November 2021).

On Markov Chains in particular I recommend Normalized Nerd‘s lectures (super well done and easy to grasp, respect!). I recommend to have a Python notebook on the side and build up the lectures there. In any case if this is new to you start here; “Markov Chains Clearly Explained! Part – 1” (There are 7 parts in total).

Somnath Banerjee, “Real World Applications of Markov Decision Process”, towardsdatascience.com, (January 2021). Source of examples that can be worked out with the Python codes provided in this blog.

Ridhima Kumar, “Marketing Analytics through Markov Chain”, towardsdatascience.com, (January 2019). Source of examples that can be worked out with the Python codes provided in this blog.

Richard Bellman, “The theory of dynamic programming”, The RAND Corporation, P-550 (July 1954). A classic with the strength of providing a lot of intuition around value and policy iteration.

Dorsa Sadigh, Assistant Professor at Stanford University’s Computer Science Department, “Markov Decision Processes – Value Iteration | Stanford CS221”, (Autumn 2019).

Neil Walton, Reader at University of Manchester’s Department of Mathematics, “Algorithms for MDPs” (2018). Very good lectures on Markov Decision Processes and the algorithms used. They are somewhat mathematical but the examples and explanations given are really good (imo).

Rohan Jagtap, “Understanding Markov Decision Process (MDP)”, towardsdatascience.com, (September, 2020). Providing a intuitive as well as mathematical treatment of value iteration. For the mathematically inclined this is a very good treatment of the topic.

A. Aylin Tokuc, “Value iteration vs Policy iteration in Reinforcement learning”, (October 2021). Provides a nice and comprehensible overview of value and policy iteration.

Paul A. Gagniuc, “Markov Chains – From theory to implementation and experimentation“, Wiley, (2017). Lots of great examples of applied Markov Chains.

Richard J. Boucherie & Nico M. van Dijk, “Markov Decision Processes in Practice“, Springer (2017). Great book with many examples of MDP implementations.

Ankur Ankan & Abinash Pranda, “Hands -on Markov Models with Python“, Packt (2018). Provides many good ideas and inspiration of how to code Markov chains in Python.

Brian Hayes, “First Links in the Markov Chain”, American Scientist (March-April 2014). Provides a very easy to read and interesting account of Markov Chains.

Kim Kyllesbech Larsen, “MarkovChains-and-MDPs“, The Python code used for all examples in this blog, (December 2021). The link does require you to have a Github account.

Deep Dive – Markov chain & decision process fundamentals.

Andrei Andreevich Markov (1856-1922) developed his idea of states chained (or connected) by probabilities after his retirement at the old age of 50 (i.e., never too late to get brilliant ideas). This was at the turn of the 20th century. One of the most famous Markov chains, that we all make use of pretty much every day, is the pages of the world wide web with 1.5+ billion indexed pages as designated states and maybe more than 150+ billion links between those web pages which are equivalent to the Markov chain transitions taken us from one State (web page) to another State (another web page). Googles PageRank algorithm, for example, is build upon the fundamentals of Markov chains. The usefulness of Markov chains spans many many fields, e.g., physics, chemistry biology, information science/theory, game theory, decision theory, language theory, speech processing, communications networks, etc…

There are a few concepts that are important to keep in mind for Markov Chains and Markov Decision processes.

Concepts.

Environment: is the relevant space that the Markov chain operates in. E.g., could be the physical surroundings of a logistic storage facility where a robot is moving around.

State: A state is a set of variables describing a system that does not include anything about its history (the physics definition). E.g., in classical mechanics the state of a point mass is given by its position and its velocity vector (i.e., where it is and where it goes). It is good to keep in mind that the computer science meaning of state is different, in the sense that a stateful agent is designed to remember preceding events (i.e., it “remembers” its history). This is however not how a state for a Markov chain should be understood. A sequence (or chain) of random variables {S0, S1, S2, … , Sn}, describing a stochastic process, is said to have a Markov property if

that is, a future state of the stochastic process depends only the state immediately prior and on no other past states. To make the concept of state a bit more tangible, think of a simple customer life-cycle process with (only) 4 states considered; (S0) Conversion, (S1) Retention, (S2) Upsell and (S3) Churn. Thus, in Python we would define the states as a dictionary,

# Example: Customer life-cycle process (simple)
# Defining States


states = {
    0 : 'Conversion',
    1 : 'Retention',
    2 : 'Upsell',
    3 : 'Churn'
}

In our example, the states is a vector of dimension 4×1, either represented by S = (0 1 2 3) or alternatively S = (Conversion, Retention, Upsell, Churn). More generally, is n×1 vector for n states.

If a reward or penalty has been assigned to the end-state, that terminates your decision or reward process, it is worth being extra careful in your Markov chain design and respective transition probability matrix. You may want to introduce a zero-value end-state. Though, it will of course depend on the structure of the decision process you are attempting to capture with the Markov Chain.

Transition: Describes how a given state transition from one state s to another s’ (e.g, can be the same state) in a single unit of time increment. The associated (state) transition probability matrix T provides the probabilities of all state-to-state transitions for a Markov chain with a single unit of time. is square stochastic matrix defined by the number of states making up the Markov chain (i.e., for n states, is an n×n matrix). We write the state transition, facilitated by T, as:

s(t+1) = s(t) ∙per unit time step increment (iteration).

Action: an action a is defined as a choice or decision taken at the current unit of time (or iteration) that will trigger a transition from the current state into another state in the subsequent single unit of time. An action may be deterministic or random. The consequence of an action a, choice or decision, is described by the (state) transition matrix. Thus, the choice of an action is the same as a choice of a state transformation. The set of actions for a given Markov Chain is typically known in advance. Actions are typically associated with what is called a Markov Decision Process. Choosing an action at time t, in a given state transitioning to state s’, may result in a reward R(s, a, s’).

Policy: A policy represents the set (distribution) of actions associated with a given set of states (representing the Markov chain) and the respective (state) transition probability matrix. Think about a customer life-cycle process with two policies, (1) No churn remedies (or actions) and (2) Churn mitigating remedies (or actions). Policies can differ only slightly (i.e., different actions on a few states) or be substantially different. It is customary to denote a policy as πa | s), which is the math way of saying that our policy is a distribution of actions conditional to given states,

π is a function such that π : → A, with π( a | s) = PA(t) = a | S(t) = s ].

A policy, strategy or plan, specifies the set of rules governing the decision that will be used at every unit time increment.

Reward: Is defined for a given state s and is the expected reward value over all possible states that one can transition to from a given state. A reward can also be associated with a given action a (and thus may also be different for different policies π). The reward is received in state s subject to action transitioning into state s’ (which can be the same state as s). Thus, we can write the reward as R(sas’) or in case the reward is independent of the state that is transitioned to, R(sa).

The concept of reward is important in so called Markov Reward Processes and essential to the Markov Decision Process. It is customary (and good for convergence as well) to introduce a reward discount factor 0 ≤ γ ≤ 1 that discounts future rewards with γ^t. Essentially attributing less value (or reward) to events in the future (making the present more important). A positive reward can be seen as an income and a negative reward as a cost.

Thus, a Markov Chain is defined by (ST)-tuple, where S are the states and T the (state) transition probability matrix facilitating the state transition. And a Markov Reward Process is thus defined by (STR, γ)-tuple with the addition of R representing the rewards associated with the states and γ the discount factor. Finally, a Markov Decision Process can be defined by (S, ATR, γ)-tuple, with A representing the actions associated with the respective states.

The Markov Chain.

The conditional probability of being in a given state S at time t+1 (i.e., S(t+1)) given all the previous states {S(t=0), S(t=1), …, S(t=t)} is equal to the conditional probability of state S(t+1) only considering (conditioned upon) the immediate previous state S(t),

∀ S(t) ∊ Ω is a given state at time t that belong to the environment Ω the Markov chain exist in.

In other words, the state your system is in now S(t) only depends only on the previous state you where in one unit time step ago S(t-1). All other past states have no influence on your present state. Or said in another way, the future only depends on what happens now not what happened prior.

TS(t) = i → S(t+1) = j, with the transition likelihood of p_ij = P[S(t+1) = j | S(t) = i ] representing the probability of transitioning from state i to state j upon a given action a taken in state i. We will regard the T as a (n × n) transition matrix, describing how states map to each other.

Where the rows represent States and the column where a state may be mapped to. Moreover, as we deal with probabilities, each row needs to add up to 1, e.g.,

Let’s simplify a bit by considering a 4-state Markov chain;

with the following Markov chain 4-state example,

with the following transition probability matrix T,

From the above illustration we have that our states (i,j) ∈ {Conversion (0), Retention (1), Upsell (2), Churned (3)}. Thus, T(1,1) = 0.75 is the probability that an action in the Retention state results in ending up in the same Retention state. Note that the first row (first column) is designated 0, second row (column) 1, etc.. As we sum the 2nd row T(1, 0 → 3) we get 1 (i.e., 0.00 + 0.75 + 0.20 + 0.05 = 1) as we require.

Let us consider the following initial condition at time t = 0 for the above Markov model,

s0 = ( 1 0 0 0 ) we are starting out in the Conversion (initial) state s0.

s1 = s0 T = ( 0 1 0 0 ), at first time step (iteration) we end up in the Retention state.

s2 = s1 ∙T = s0 ∙T∙T = s0 ∙T^2 = ( 0.00 0.75 0.20 0.05 ). So already in 2nd time step (iteration) we have 75% likelihood of again ending up in the Retention state, 20% of ending up in the Upsell state as well as 5% chance that our customer Churn and thus ends the Markov process.

s3 = s2 ∙T = s0 ∙T∙T∙T s0 ∙T^3 =( 0.00 0.76 0.15 0.09 )

s10 = s9 ∙T = s0 ∙T^10 = ( 0.00 0.56 0.12 0.32)

s36 = s35 ∙T = s0 ∙T^36 = ( 0.00 0.19 0.04 0.77 )

Eventually, our overall Markov chain will reach steady state and ∙ T = s. It is common to use π for the Markov chain steady-state. Thus, we will frequently see π ∙ T = π, reflecting that steady state has been reached (usually within some level of pre-defined accuracy). To avoid confusion with policy mapping, which is often also described by π, I prefer to use π∞ to denote that a steady-state state has been reached.

Within a pre-set accuracy requirement of ε < 0.01, we have that s36 ≈ steady-state s-state and thus s36 ≈ s36.

It should be noted (and easy to verify) that introducing a 5th End-state (i.e., splitting up the churn-and-end-state into two states) in our example, will not change the steady-state outcome except for breaking up the churn’s steady-state value (from the 4-state steady-state analysis) into two values with their sum being equal to the 4-state churn value.

Value Iteration.

We start out with a Markov chain characterized by (S,T)-tuple that describes the backbone of our decision process. We have the option to add actions (e.g., can be a single action as well) and associate reward with the respective states and actions in our Markov chain. Thus, we expand the description of our Markov chain to that of a Markov Decision Process (MDP), that is (SATR, γ)-tuple (or for a Markov Reward Process (STR, γ)-tuple), with γ being the discount factor (0 ≤ γ ≤ 1). Rohan Jagtap in his “Understanding Markov Decision Process (MDP)” has written a great, intuitive and very assessable account of the mathematical details of MRPs and MDPs. Really a recommended reading.

We have been given a magic coin that always ends up at the opposite face of the previous coin flip, e.g., Head → Tail → Head → Tail → etc.. Thus we are dealing with a 2-state process with period cycling between the two states (i.e., after 2 tosses we are back at the at the previous face). Each state with probability 1 of transitioning to the other. Also, we are given a reward of +2 (R(H))when we are transitioning into the Head-state (S0) and a reward of +1 (R(T)) when we are transitioning into the Tail-state (S1). We have thus 2 initial conditions (a) starting with Head and (b) starting with Tail.

How does the long-run (i.e., steady-state) expected value for each of the two states H & T develop over time?

(a) Assume our magic coin’s first face is Head (H), this earns us a reward of R(H) = +2. At the next unit time step we end up in Tail (T) with probability 1 (= P[T|H)) and reward of R(T) = +1. Next step we are back in Head with probability 1 (=P(H|T)), and so forth. The future value we may choose to discount with γ (and if γ less than 1, it even guaranty that the value converges). For (b) interchange, interchange H with T (and of course rewards accordingly).

It is good to keep in mind that the reward is banked when in the state, after the transitioning into it from the previous state. The value accrued over time at a given state, is the present reward R(s) as well as the expected (discounted) reward for the subsequent states. It is customary to start out with zero value states at t=0. Though, one could also choose to use the reward vector instead to initialize the value of the states. So, here it goes,

Alright, no, I did not sum all the way infinite (I wouldn’t have finished yet). I “cheated” and used the ‘mdp_valueIteration()’ function;

# Import own Markov chain (MC) & Markov Decision Process (MDP) library
import mcmdp_v2 as mdp


# States
states = {
    0 : 'Head',
    1 : 'Tail'
}


# Transition Matrix - Magic Coin
T = np.array([[0.00, 1.00],
              [1.00, 0.00]])


# Reward Matrix - Magic Coin
R = np.array([[2], 
              [1]])


pi = np.array([1, 0,]) # Initial state, could also be [0, 1].


# Define the markov chain mc for the MDP value iteration.
mc = mdp.Mdp(states = states, pi = pi, T = T, R = R, gamma = 0.9, epsilon = 0.01)


state_values, expected_total_value, policy, ctime = mc.mdp_valueIteration() # Value iteration on mc


print('Long-run state value V[H]   :', np.round(state_values[0],1))
print('Long-run state value V[T]   :', np.round(state_values[1],1)) 

output>> Long-run state value V[H]   : 15.2
output>> Long-run state value V[T]   : 14.7

In general, we have the following value iteration algorithms representing the state-value function as we iterate over time (i),

With [1] formula describing a general MDP algorithm. Formula [2] is an MDP where the state reward function is independent of actions and subsequent state s’, and formula [3] describes a Markov Reward Process, where the reward function R is independent of the subsequent state s’. In order to get the value iteration started it is customary to begin with an initial condition (i.e., i = 0) of V_0 = 0 ∀ s ∊ S, e.g., for a 5-state process V_0 = [0, 0, 0, 0, 0] at i = 0, that is the initial value of all states in the Markov chain is set to zero.

The long-run steady-state state values are the out come of iterating the above formulas [1 – 3] until the state values are no longer changing (within a pre-determined level of accuracy). We can write the long-run steady-state values as,

with V[Sj] is the j-th state’s steady-state value and n is the number of states in the underlying Markov chain representing the MDP (or MRP for that matter).

The long-run average (overall ) value G in steady-state is

where V∞[S] is the steady-state value vector that the value iteration provided us with. π∞ is the decision process’s underlying Markov chain’s steady-state state.

One of the simpler examples to look at would be a “coin toss” process. In order to make it a bit more interesting, lets consider a unfair-ish coin to toss around. In the first example immediately below, we assume to have only 1 action and that the state rewards only depends on the state itself. Thus, we are in Formula [3] situation above. How we go around the above value-iteration algorithm is illustrated below,

Let us have another look at our customer life-cycle process. We would like to have a better appreciation of the value of each state in the decision-making process. The value iteration approach is provided in the illustration below,

Coding reference.

Kim Kyllesbech Larsen, “MarkovChains-and-MDPs“, The Python code used for all examples in this blog, (December 2021).