*“Would you like an adventure now or shall we have tea first”, Alice in Wonderland (Lewis Carroll).*

Google parent company Alphabet breached a US$2 (~ €1.8) trillion valuation in November this year (2021). What few realize is that this valuation, as well as the success of Google, is based on a Markov Decision Processes and its underlying Markov chain. Making PageRank, the brain child of Sergey Brin and Larry Page, likely the most valuable and most used applications of a Markov chain ever, period. We all make use of a Markov Chain and a Markov Decision Process pretty much every day, many many times a day. The pages of the world wide web, with its 1.5+ billion indexed pages, can be designated states of a humongous huge Markov chain. And the in excess of 150+ billion hyperlinks, between those web pages, can be seen as the equivalent of Markov state transitions, taken us from one State (web page) to another State (another web page). The Markov Decision Process value and policy iteration ensures that the consumers of Google’s search engine gets the best search results in the fastest possible way. Many hyperlink paths may lead to your page of interest, but many (maybe most) of those paths will not get you where you want to be fastest. Optimization algorithms develop on the world-wide web Markov chain will find the best path to the content you want (or the algorithm “thinks” you want). I cannot think of a better motivating example for business people to get acquainted with data-driven decision processes and the underlying Markov chains than that.

In the course of my analysis of corporate decision makers sentiment towards data-driven decision making and processes, it became clear that there is very little comprehensive material (for business people) on structured hands-on approaches to decision processes. How do you go around implementing a data-driven decision process in your company? and this despite, such approaches can be fantastic tools for structuring and optimizing a business decision making processes. Not only that. We can integrate such decision processes algorithms into our digital platforms. Achieving a very high degree of autonomous business decisions (e.g., Google’s PageRank decision process), that in turn will enhancing our customer’s experience. Also, it allows us to monitor the overall quality of customer interactions with our digital environment(s) (e.g., web-based environments, user-interfaces, closed-loop user-experience, digital sales, apps, bots, etc..). Such data-driven integration would result in more efficient business processes internally and towards external customers and partners.

So, my goal, with this blog, is to take you through the essentials, the enablers, to better understand what a decision process may look like and its wide applicability towards business-relevant decision making.

I like hands-on. The focus of this blog will be on providing coding examples for the reader to implement and in parallel play with the examples given. Maybe even better encourage you to create your own examples, relevant to your area of business interest. I will take you through two major important enablers for digitizing data-driven decision processes. One enabler is so-called Markov Chains which is essential in understanding the other, Markov Decision Processes (MDPs). I will attempt to do this intuitively, via coding examples, rather than write down a lot of mathematical notation (which is the normal approach). I will demonstrate some relative simple business examples that hopefully will create an appetite to learn much more about this field.

At the very end of this blog, you can either stop reading or you can continue (an MDP maybe?) and will then find a more formalistic treatment of Markov Chains and MDPs. Although somewhat more “long haired” (coming from a bald guy), I attempted to make it assessable to readers that may not have a math degree.

In my previous blog, “Data-driven decision making … what’s not to like about that?”, I wrote about data-driven decision making for businesses and public institutions. I described, or more accurately pulled a so-called Markov Decision Process out of my magic hat, that we could view as a data-driven decision process,

In the example above, I mapped out how a data-driven decision process might (or should) look like. The process consist of 6 states or stages (i.e., Idea, Data Gathering, Insights, Consultation, Decision, Stop) and actions that takes us from one state to the other (e.g., Consult → Decision), until the final **Decision** state, where we may decide to continue, develop further, thus back to the **Idea,** or decide to terminate (i.e., **Stop**). We can associate our transitions and the associated actions with likelihoods, based on empirical evidence that fancy people may describe as process mining, of a given state transition (e.g.., Insights → Consult vs Insights → Decision, …) to occur.

The described decision process is not static. It is dynamic and is likely to evolve over time. Transitions from one state to another can be seen as moving forward in time increments. You find yourself in one stage of the process and making up your mind, that is you make a decision or take an action, what next state you “want” to move to. With a given likelihood of making such a decision if there are several stages to move to and depending on the level of stochasticity. The rules of how to move from one state to the next is given by the scenario you are considering.

The above illustration may (or may not) look complex. So let’s break down on a higher level what we see above. We have Circles that represent a **Stage** in the decision process, e.g., Idea, Data gathering Insights, Consult, Decision, Stop. These stages are what we also could call **States** which is the official name if we speak Markovian. I will use stage and state interchangeably. In between the stages we have white colored arrows that takes us, or **Transition** in Markovian, from one stage to the next (e.g., from Insights to Consult). A transition from one stage to the next is triggered by an **Action** (e.g., a decision) in the stage we are in that leads us to the following stage in our process. For each transition and associated action we can assign a **Transition Probability** that describes the likelihood of a particular transition & action combination to take place. For example, in the above chart, I have a transition probability from Insights stage to the Consult stage of 40%. I can also associate a reward R associated with a given transition. This reward is “banked” as you enter the new stage. For example, I can expect a reward of €R23 (see illustration above), as I transition from Insights to Consult stage. An arrow that goes back to its own stage simply mean that a decision, or action, is taken that results in us remaining in that particular state (e.g., 20% chance of remaining in the Decision stage which here implies that we continue as is). As transitions are associated with probabilities, it follows that sum of transitions probabilities (arrows) leaving a state is required to be equal to 1. It is possible to define several transition probability scenarios per state. Such scenarios is called policies in the language of Markov. The idea is that we can run through several scenarios, or policies, to determine if some are better than others and thus optimize our decision making process (assuming the optimal policy is also feasible). Typically, actions, (state) transitions, and associated rewards will not be are not symmetric. In the sense, that the likelihood (& reward) of going from State 1 to State 2 may not be the same as going from State 2 back to State 1.

What I have describe above is the fundamental setup of a so-called Markov Chains and how such can be extended to action (e.g., decision), rewards (e.g., income & cost) and policy (i.e., scenario) estimation and optimization. Thus, into what we call Markov Decision Processes (MDP) or its “cousins” Markov Reward Processes (MRP).

The question remains, of course, how do we actually do some meaningful analysis on a decision process as illustrated above? How do we code this?

While you will find some examples in the public domain on analysis of **Markov Chains**, it becomes a bit more technical as you move up the “intellectual food chain” to **Markov Decision Processes**. I have provided some simple, but generalized, Python codes throughout this blog. These will allow you to run some of this analysis yourself and gain a lot of insights into Markov chains and (Markov) Decision Processes.

### A simpler customer life-cycle & retention example.

Let’s take a simple example of customer life-cycle of a typical subscription (e.g., online, magazine, services, …). We start our process with the **Conversion** of a prospective customer to a real customer, kicking of the customer life-cycle. After the sale, the customer starts the service which below is defined as **Retention**, we want to keep our customer with that service. During the retention period, or life as a customer, the illustration below assumes 3 events may happen; **(1)** Our Customer is okay with service and wish to continue as is (i.e., remains in retention stage), **(2) **Our** **Customer is interested to add additional features to the existing subscription, thus accepting an **Upsell **after which the customer falls back into the retention phase, and finally **(3) **Our Customer may decide to discontinue the subscribed service. Thus, **Churn **and end the engagement and customer life-cycle process. In fact, once the churn state has been reached, it cannot be left. We call such a state and **absorbing state**. A Markov chain that includes absorbing states (or at least one) is called an **absorbing Markov chain**.

In the above customer life-cycle illustration, we have 4 states (or stages); Conversion (**S0**), Retention (**S1**), Upsell (**S2**) and Churn (**S3**). The transition from one stage to the next (e.g., Retention → Upsell) is driven by a decision (i.e., action) to do so. Each transition is associated with a likelihood of that particular decision is being made, e.g., there is 20% that a customer decides to accept an Upsell when located in the Retention state.

In a Python code we can operationalize the process as follows;

# Customer life-cycle process (simple) # Define States states = { 0 : 'Conversion', 1 : 'Retention', 2 : 'Upsell', 3 : 'Churn' } c = 0.05 # Churn likelihood u = 0.20 # Upsell likelihood #Transition probability matrix: T = np.array([ [0.00, 1.00, 0.00, 0.00], [0.00, 1-u-c, u, c], [0.00, 1.00, 0.00, 0.00], [0.00, 0.00, 0.00, 1.00]])

Each row (and column) in the transition probability matrix represents the stages in the process. Note that each row is required to sum up to 1. Our curiosity of the dynamics of our process, our chain of events, should lead us to ask *“In the long run, what are the chance of ending up in each of the defined states of our process?”*. Actually, for our customer life-cycle process, we might like understand what is our long-term churn proportion given the current scenario. Maybe even trade-offs between upsell and churn rate.

How do get about that question solved?

Our process have reached steady-state when the likelihood of ending up in a given state does not change any longer with subsequent evolution of time. What this means is that for a given overall process state, represented by **π**, applying the transition matrix **T **no longer changes that overall process state (i.e., **π**∙**T** = **π**). Thus, **π **then represents the expected likelihood of being in a given state once steady-state has been reached.

In our illustration above, we can kick off our process with **π0 **= [1, 0, 0, 0]** **which represents that our initial stage is in the Conversion state. Applying the transition matrix **T** to our initial state will transition it into the Retention stage, i.e., **π0**∙**T** = [0, 1, 0, 0] = **π1**. The next step then is **π1**∙**T** = [0, 0.79, 0.20, 0.01] = **π2** and so forth until **π**∙**T** = **π **for all subsequent time steps (algorithmically we can describe this iterative process as **π**∙**T** ← **π**). Following this recipe we can create a small Python code that will do the work for us;

def steady_state(pi, T, epsilon=0.01): # MARKOV CHAIN STEADY STATE. # pi : Given n states pi is an array of dim (n,) . # T : The transition probability matrix of dim (n,n) # epsilon : Provides the convergence criteria. j = 0 #Counter while True: oldpi = pi pi = pi.dot(T) j+=1 # Check Convergence if np.max(np.abs(pi - oldpi)) <= epsilon: break # In case of no Convergence if j == 1000: break return pi # Returning the likelihood of the steady-state states.

Using the above code, the transition matrix **T **given above and our customer life-cycle process initial condition π0 = [1, 0, 0, 0], we find that

# Finding Customer life-cycle Steady-State pi0 = np.array([1, 0, 0, 0]) pi = steady_state(pi0, T) # Using above steady-state function print(np.round(pi,2)) output>> steady-state pi = [0, 0.19, 0.04, 0.77]

So, within the existing customer life-cycle scenario a Churn rate of 5% will in the long run lead to an overall churn likelihood of almost 80%. This is with a Retention transition probability of of 75% and an Upsell transition probability of 20%. It may be a surprising outcome, that a relative low probability action (or event) can lead to such a dominant business impact. Intuitively, we should remember that churn is a terminal event. Retention and even Upsell simply continue to operate within the life-cycle process with the ever present “doom” of a customer leaving by churning. More detailed analysis of this process will show that as long as we keep the Churn transition probability below 1.3% the Churn State’s steady-state likelihood to transition probability is below 3.3 times that of the transition probability. Above 1.3% transition probability, the steady-state Churn likelihood rapidly increase to10 – 20 times that of the base transition probability. Needless to say, the current policy, as represented by the transition matrix, would require optimization in order to minimize the churn impact and ideally we would need to aim for measures keeping churn below 1.3%.

The purpose of this blog, as stated in the beginning, is not so much studying the dynamics of a hypothetical customer life-cycle management processes or any other for that matter. The purpose of this blog is to provide you with an understanding of decision process modelling, the analysis possible, and tools to go and do that yourself.

### A simple customer life-cycle example with rewards.

How do we get from the process dynamics to an assessment of value of our given strategy or policy? There are several dimensions to this question. Firstly, we would of course like to enhance the value of the overall decision process. Next, we also would like to ensure that each stage of our process has been optimized as well, in terms of value, with value being monetary, time-wise, number of stages, order of states, etc..

I will provide you with a reasonably simple approach as well as the code to work out your own examples. It is in general straightforward to add complexity to your process and still be able to use the provided code.

In the above illustration two different scenarios have been provided. Both scenarios are represented by the same transition probability matrix. This is not a requirement but makes the example simpler. **Policy 1** mimics an “annual subscription model” with a subscription value of 100 per annum, an upsell value of 20 per annum and a (1-time) loss of value by churn of -100. **Policy 2 **represents a “monthly subscription model” with a subscription value of 10 per month, an upsell value of 2 per month and a (1-time) loss of value by churn of -60. I would like to know what the overall expected value is for each of the two scenarios.

However, **Houston we (may) have a problem here**. Remember that once we end up in the “Churn” state we are stuck in that state (i.e., we don’t “un-churn” … at least not in this process). From a transition probability matrix perspective once we are in the churn state every new iteration transition back to the same state. In fact, as already discussed previously, the churn state acts as an absorbing state, making our customer life-cycle Markov chain an absorbing Markov Chain. In this particular case, we would like to assign a one-time penalty (i.e., negative reward) to the churn state and then end the process. Of course this may not always be the case. But, here it is the case. From a (state) transition matrix perspective we have created our own equivalent of “Groundhog day” as we keep returning to same state and thus might end up multiplying the churn penalty times the number of iterations it takes us to get to convergence of the overall system. Unless we are careful. We have **two solutions** **(1)** Choose an appropriate small churn penalty that with a given discount factor (lets call it γ) ensures that the penalty quickly converges to a reasonable figure (yeah … not the nicest solution, but it could work) and **(2)** Introduce an End-state with zero reward/penalty where the churn-state has a probability of 1 to end up in. Thus, pushing the absorbing state away from the churn-state which will be convenient for the valuation estimation process. This, will ensure that as we end up in the churn-state, its penalty is only counted once. This will be an important consideration as we commence on value iteration for a Markov Decision Process. So, our above illustration needs a bit of revision and get’s to look like this,

and our Python implementation of the states, transition matrix and reward vector will look like this,

# Customer life-cycle process (simple) # Define States states = { 0 : 'Conversion', 1 : 'Retention', 2 : 'Upsell', 3 : 'Churn', 4 : 'End' } c = 0.05 # Churn likelihood u = 0.20 # Upsell likelihood #Transition probability matrix: T = np.array([ [0.00, 1.00, 0.00, 0.00, 0.00], [0.00, 1-u-c, u, c, 0.00], [0.00, 1.00, 0.00, 0.00, 0.00], [0.00, 0.00, 0.00, 0.00, 1.00], # Churn transition to End State [0.00, 0.00, 0.00, 0.00, 1.00]]) # End state

If you run the 5-state transition matrix through the above Python “steady_state()” function, you would get **π** = (0 0.18 0.04 0.01 0.77) where the last two-states (i.e., the Churn- & End-state) are the part of the same state, thus mapping it back to the 4-state model we have **π** = (0 0.18 0.04 0.78) which is very close to the above “true” 4-state model (i.e., with the churn-state also being the end-state) **π** = (0 0.19 0.04 0.01 0.77). From **π** we see that we have a steady-state likelihood of 77% ending up in the absorbing state in our customer life-cycle process.

For value estimation purposes, we would like to ignore the absorbing state and renormalize the steady-state vector **π**. Thus, **π’** = (0 0.19 0.04 0.01 **0**)/sum((0 0.19 0.04 0.01 **0**)) which results in a renormalized steady-state vector **π’** = (0, 0.79, 0.17, 0.04, **0**) that we will use to asses the long-run average value of our customer life-cycle process.

So let’s do a bit of random walk across our decision process as depicted above. I will make use of repeated random sampling (also called a Monte Carlo simulation among friends) on my decision process. The transition matrix will bring me from one stage to another with a rate provided by the respective transition probability matric (**T**). The reward vector (**R**), see above illustration, provides the reward expectation for each state (e.g., **R** = [0, 100, 20, -100, 0] for **Policy 1**). As the repeated random sampling progresses and the various decision process states are visited, we add up the respective state’s rewards until we end up in the Churn state that will end the simulation. The simulation is then repeated and the average of all the simulated process values provides our expected overall value for the decision process. We have the option to discount future values with gamma (γ < 1);

with n* = min { N, n @ EndState} with N being the maximum allowed simulation time steps iterations and “n @ EndState” is the time step where our simulation end up in the EndState terminating the simulation. This principle is similar to financial net present value calculations that value the present more than the future at for example a weighted average cost of capital (WACC) or discount rate. Also it will turn out mathematically convenient to include a discount rate as it ensures that our value iteration converges (which is pretty convenient).

The Python code for the described simulation is provided here;

def mc_random_walk_reward(states, state0, T, R, gamma=0.9, tot_steps, EndState) # EXPECTED POLICY & VALUE ASSESSMENT VIA RANDOM WALKS. # states : Array which includes the names of the n states being studied. # T : The transition probability matrix of dim (n,n) # R : The reward array of dim (n,) e.g., can have several columns pending # policies. # gamma : Value Discount Factor, < 1.0 # tot_ steps : Maximum number of time step iterations unless EndState is # encounted. # EndState : The end-state that would stop the simulation unless steps is # encounted. n = steps stp = 0 start_state = state0 path = [states[start_state]] value_state = R[state0]*(gamma**0) prev_state = start_state i = 1 while n: curr_state = np.random.choice(list(states.keys()), p = T[prev_state]) value_state += R[curr_state]*(gamma**i) path+=[states[curr_state]] prev_state = curr_state i+=1 n-=1 if states[prev_state] == EndState: stp = n + 1 break return (path, value_state, stp)

The Transition matrix remain, thus we need to define the Reward vector;

# Reward vector for Policy 1 - Annual Subsctiption Model. R1 = np.array([ [0], [100], [20], [-100], [0]) # Reward vector for Policy 2 - Monthly Subsctiption Model. R2 = np.array([ [0], [10], [2], [-60], [0])

Thus, we are ready to call run the decision process valuation;

j = 0 stop = 10000 # Total amount of Monte Carlo simulations df_value = [] df_time = [] # For R use R1 for Policy 1 and R2 for Policy 2. while True: a,b,c = mc_random_walk_reward(states, 0, T, R, 0.90, 100, 'Churn') df_value.append(b) df_time.append(c) j+=1 if j == stop: break unit_time = 1 # 1 month in 1 time unit iteration. print('Total Value of Policy 2: ', np.round(np.mean(df_value)/unit_time,0)) print('Mean time to Churn of Policy 2: ', np.round(np.mean(df_time),0)) output>> Expected Total Value of Policy 2: 42.0 output>> Average time to Churn of Policy 2: 77.0

The expected total value of Policy 2 comes out at ca. 42. Applying the above code to Policy 1, get’s us an expected total value of 46 and thus a bit more attractive than Policy 2. This provides a fairly easy way to get an idea about a given decision process value. As we enter the maximum amount of time step iteration steps (e.g., 100 in the code snippet) it is good to check that the average time to churn over the total amount of Monte Carlo simulations is less than this number (e.g., 77 < 100 in the above code snippet). It is wise to play a bit with the maximum steps number to check whether your expected total value of your policy changes significantly.

It should be noted that as the Monte Carlo simulation of the Markov Chain terminates upon Churn, the churn penalty is only accounted for once as is appropriate in this particular example. In other words, for this part we would not strictly speaking require the fifth state to end the process.

If we compare the expected value of our process of 42.0 (Policy 2) with the **Value Iteration **algorithmic approach (more on that below), used on Markov Decision Processes, we find steady-state state values of **V**[Policy 2]** **= [ 42.2, 46.9, 44.2, -60, 0] = [Conversion, Retention, Upsell, Churn, End]. With the End-state representing our absorbing state. Using our renormalized steady-state state **π’** = (0.00, 0.79, 0.17, 0.04, 0.00), we find that our long-run average value is

G = **V ∙ π’ = **42.2 ∙ 0.00 + 46.9 ∙ 0.79 + 44.2 ∙ 0.17 + (-60) ∙ 0.04 + 0 ∙ 0.00 = 42.2

Which is a pretty good agreement with the Monte Carlo simulations of our Customer Lifecycle Process. And even better … a much faster way of getting the long-run value. However, if you are dealing with an (or several) absorbing states, some caution in how to compensate for those should be considered. I like in general to run a Monte Carlo process simulation just to ensure that my value iteration extraction logic is correct.

### Decision process optimization … Dual policy and value iteration.

We are often in situations where we need to evaluate different assumptions or policies in our decision process. Imagine you are looking at a subscription process, as shown below, where you have broken down the customer life-cycle in an 3 parts (i.e., states or stages); **(1)** Start of subscription, **(2)** Average life-time and **(3) **long-time subscriptions. You are contemplating on two different policies; **Policy 0 (white arrows)**: No churn intervention and **Policy 1 (orange arrows): **Churn intervention measures at each stage in the subscription process. After a churn intervention, at a given state, your system will treat you as a new customer (**note:** this might not be the smartest thing to do, but it is the easiest to illustrate).

The decision process for such a customer life-cycle process is illustrated below. I have deliberately not added an end-state after churn in this example (i.e., so strictly speaking once we end up in the Churn state we are in our “Groundhog state” of “perpetual” churn … which is btw why we like to call it an absorbing state). It adds complexity to the Markov chain and the purpose of this example is to show the policy and value optimization in a situation where we have 2 policies to consider in our decision process.

You would like to know, what the value is at each state, given observed churn. Moreover, you also require to know when (time-wise) and where (state-wise) it might become important to shift from one policy to the other. So let’s (Python) code the above customer loyalty decision process;

# Customer life-cycle process - dual value and policy iteration # Define States # Subcription state states = { 0 : 'Start', 1 : 'Average life-time ', 2 : 'Long-time', 3 : 'Churn' } #Transition probability matrix: c = 0.05 # Churn rate; try to increase this with steps of 0.05 # and you will see that value and policies begin to change as # the churn rate increases # Policy 0 - No churn intervention measures. policy_1 = np.array([ [0.00, 1-c, 0.00, c], [0.00, 0.00, 1-c, c], [0.00, 0.00, 1-c, c], [0.00, 0.00, 0.00, 1.00] ]) # Policy 1 - Churn Intervention measures. policy_2 = np.array([ [1.00, 0.00, 0.00, 0.00], [1.00, 0.00, 0.00, 0.00], [1.00, 0.00, 0.00, 0.00], [0.00, 0.00, 0.00, 1.00] ]) T = np.array([policy_1, policy_2]) #Transition probability matrix # Reward array for the two policies # 1st column reflects rewards in Policy 0 # 2nd column reflects rewards in Policy 1 R = np.array([ [0, 0], [0, 1], [4, 2], [-1, -1]]) print(states) print(T) print(R)

In order for us to get the optimal decision process value and respective policy we are making use of an iterative computing algorithm called **Value Iteration**. The value iteration procedure provides the values of the various states of our decision process with known transition probabilities and their corresponding rewards. The same algorithm also provides for the optimal policy matching the optimum values (i.e., **Policy Iteration**). I have implemented the Value and Policy Iteration algorithm in the code below.

def mdp_valueIteration(states,T,R, gamma = 0.90, epsilon = 0.01): # VALUE AND POLICY ITERATION # states : Array which includes the names of the n states being studied. # T : The transition probability matrix of dim (n,n) # R : The reward array of dim (n,) e.g., can have several columns pending # the number of policies. # gamma : Value discount factor, < 1.0 # epsilon : Provides the convergence criteria. # Initialize V_0 to zero values = np.zeros(len(states)) ctime = 0 # Value iteration # Continue until convergence. while True: # To be used for convergence check oldValues = np.copy(values) values = np.transpose(R) + gamma*np.dot(T,values) # Value iteration step policy = values.argmax(0) # Take the best policy. values = values.max(0) # Take the highest value ctime +=1 # Check Convergence if np.max(np.abs(values - oldValues)) <= epsilon: break return(values, policy, ctime)

All we have to do is to call the above “mdp_valueIteration” function with the transition probability matrix and respective reward vectors for Policy 0 and Policy 1.

# Call ValueIteration function and get optimum value per state and # Optimum Policy strategy per state. values, policy, time = mdp_valueIteration(states,T,R,gamma = 0.90) print('Optimum State Values: ', np.round(values,0)) print('Optimum Policy : ', policy) print('Optimum Time steps : ', time) # Output for Churn rate c = 0.05 output>> Optimum State Values: [17 21 25 -10] output>> Optimum Policy : [ 0 0 0 0] # Output for Churn rate c = 0.3 output>> Optimum State Values: [ 0 1 4 -10] output>> Optimum Policy : [ 1 1 0 0]

We have 4 states defined in our subscription decision process, “Start of subscription” (**S0**), “Average life-time subscription” (**S1**), “Long-time subscription” (**S2**) and “Churn”(**S3**). Not surprisingly we find that the highest value is delivered by the subscribers in the long-time subscription category. For relative low churn rates, the policy without churn intervention measures (i.e., Policy 0) is the most appropriate. As the churn rate increases, we observe that the best policy (in terms of value optimization between the two policies) for State 0 and for State 1, is to have a churn intervention policy (i.e., Policy 1) in place.

In summary, with a more detailed analysis of our dual-policy customer life-cycle decision process we find the following process dynamics as a function of the churn rate.

For comparison, I have also tabulated the value outcome in the case that we do not consider a churn mitigation policy.

Using the provided code and examples, it is straightforward to consider more complex decision process applications. The codes I have provide are very general and your work will be in defining your process underlying Markov chain with its transition probabilities and the respective decision process rewards (positives as well as negatives). Once that is setup, it referencing the provided code functions.

### The hidden Markov model … An intro.

We are often in the situation that we get customer feedback (observable) without being complete sure what underlying (“hidden”) processes that may have influenced the customer to provide a particular feedback. A customer may interact in a particular way with our web store (again observable). That customer interaction is likely also to be influenced by hidden processes and behaviors of the underlying system dynamics (e.g., user interface architecture, front-end interactions, back-end interactions, etc…).

Let’s add some observable “happiness” and “unhappiness” to our customer retention and life-cycle process from the beginning of this blog.

We have already defined the transition probability matrix **T **above. We also need to to write the observable sentiments associated with states into a matrix form as well (this is called the emission matrix in Markovian).

To visualize this a bit better we observe;

The following code function provides the likelihood of observing a sequence of observable states (e.g., Happy, Happy, Not Happy, Happy, Not Happy, Not Happy, …) with a given hidden Markov chain describing the underlying process that may influence the observable states.

def hmm_find_prob(seq, T ,E, pi): # HIDDEN MARKOV CHAIN LIKELIHOOD OF A SEQUENCE OF OBSERVABLES # seq : The observable sequence. # T : Transition Probability matrix (hidden states). # E : Emission matrix (observable states). # pi : The steady-state of the hidden markov chain. start_state = seq[0] alpha = pi*E[:,start_state] for i in range(1, len(seq)): curr_state = seq[i] alpha = (alpha.dot(T))*E[:,curr_state] prob = sum(alpha) return prob

In order to estimate the likelihood of observable sentiment states (i.e., **Happy** and **Not Happy**) of our hidden Markov process (illustrated above) we need to code the matrices, identify the sentiment sequence we would like to access the likelihood of.

# Setting up the pre-requisites for a Hidden Markov Model hidden_states = { 0 : 'Conversion', 1 : 'Retention', 2 : 'Upsell', 3 : 'Churn' } observable_states = { 0: 'Happy', 1: 'Sad' } # Note that even if the hidden and observable states are not directly used # in this example, it is always good dicipline to write them out as they # tie into both the transition and the emission matrix. # Transition Probability Matrix (Hidden States) T = np.array( [[0.00, 1.00, 0.00, 0.00], [0.00, 0.75, 0.20, 0.05], # Note we assume 5% churn here! [0.00, 1.00, 0.00, 0.00], [0.00, 0.00, 0.00, 1.00]]) # Emission Matrix (Observable States) E = np.array( [[1.0, 0.0], [0.7, 0.3], [0.8, 0.2], [0.0, 1.0]]) pi0 = np.array([1, 0, 0, 0]) pi = steady_state(pi0, T) # Steady State of the underlying Markov chain. seq = [0,0,0,0,0] # Happy, Happy, Happy, Happy, Happy print('Likelihood of having 5 consecutive happy experiences: ', np.round(100*hmm_find_prob(seq, T, E, pi),0), '%') seq = [1,1,1,1,1] # Not Happy, Not Happy, Not Happy, Not Happy, Not Happy print('Likelihood of having 5 consecutive negative experiences: ', np.round(100*hmm_find_prob(seq, T, E, pi),0), '%') output>> Likelihood of having 5 consecutive happy experiences: 4.0 % output>> Likelihood of having 5 consecutive negative experiences: 78.0 %

We have to draw the conclusion that there is a much higher chance of having 5 consecutive negative experiences (78%) than 5 positive experiences (4%). This should not be too surprising as our steady state (long-term behavior) of the customer life-cycle process with 5% churn rate is 77%. As we assess that a customer in the churn state is 100% likely to be unhappy, it is clear that unhappiness would be the overwhelming expectations for the defined process. It should be noted that as an alternative, we could also run the decision process replacing monetary value with sentiment value and optimize our decision process for customer sentiment.

If I could lower my **Churn** rate, in my decision process, down to 1% (from 5%), increasing my **Retention** rate to 79% and keeping the **Upsell **rate at 20%, I would have 18% chance of 5 consecutive happy experiences and only 4% of 5 unhappy experiences.

It is not that difficult to see how these principles can be applied to many different settings in the digital domain interfacing with customers.

### Wrapping it up.

In this blog, I have provided the reader with some insights into how to apply Markov Chains and Markov Decision Processes to important business applications. The Python code snippets you have met throughout this blog can directly be used in much more complex decision processes or deeper dives on Markov chains in general.

I have found it difficult to find reasonable comprehensive examples of how we can move from Markov chains (or models) to Markov Decision Processes as it applies to a data-driven business environment. You can find some good example on applied Markov Chains (see **“For further study”** below. I really recommend Normalized Nerd YouTube videos for an introduction). There is however very little material bridging the gap from Markov models to decision processes. While the math behind both Markov Chains and Markov Decision Processes are not very difficult, it is frequently presented in a way that makes it incomprehensible unless you have an advanced degree in mathematics. When you write it out for simpler Markov chains or decision processes you will see that it is not that difficult (I am still pondering on giving a few examples in this blog).

I recommend that you always run a couple of Monte Carlo simulations on your Markov chain and decision process. Comparing the outcome with the algorithmic approaches you might apply, such as steady-state state derivation, value and policy optimization, etc.. If your process contains absorbing states, be extra careful how that should be reflected in your overall process valuation and optimization. For example, if you consider churn in a customer lifecycle process, this would be an absorbing state and unless you are careful in your design on the underlying Markov chain, it may overshadow any other dynamics on your process. A Monte Carlo simulation might reveal such issues. Also, start simple, even if your target decision process may be very complex. This will allow you to understand and trust changes as you add complexity to your process.

I hope that my code snippets likewise will make this field more approachable. Anyway, if you want to deep dive into the math as well, you will find some good starting points in my literature list below.

*“We should not fault an agent for not knowing something that matters, but only for having known something and then forgotten.”, The unremembered.*

### Acknowledgement.

I greatly acknowledge my wife Eva Varadi for her support, patience and understanding during the creative process of writing this Blog. Also many of my Deutsche Telekom AG, T-Mobile NL & Industry colleagues in general have in countless of ways contributed to my thinking and ideas leading to this little Blog. Thank you!

### For further study.

Romain Hollander, “On the policy iteration algorithm for PageRank Optimization”, MIT Report (June 2010).

Kim Kyllesbech Larsen, “Data-driven decision making … what’s not to like about that?”, LinkedIn Article (November 2021).

On Markov Chains in particular I recommend **Normalized Nerd**‘s lectures (super well done and easy to grasp, respect!). I recommend to have a Python notebook on the side and build up the lectures there. In any case if this is new to you start here; “Markov Chains Clearly Explained! Part – 1” (There are 7 parts in total).

Somnath Banerjee, “Real World Applications of Markov Decision Process”, towardsdatascience.com, (January 2021). Source of examples that can be worked out with the Python codes provided in this blog.

Ridhima Kumar, “Marketing Analytics through Markov Chain”, towardsdatascience.com, (January 2019). Source of examples that can be worked out with the Python codes provided in this blog.

Richard Bellman, “The theory of dynamic programming”, The RAND Corporation, P-550 (July 1954). A classic with the strength of providing a lot of intuition around value and policy iteration.

Dorsa Sadigh, Assistant Professor at Stanford University’s Computer Science Department, “Markov Decision Processes – Value Iteration | Stanford CS221”, (Autumn 2019).

Neil Walton, Reader at University of Manchester’s Department of Mathematics, “Algorithms for MDPs” (2018). Very good lectures on Markov Decision Processes and the algorithms used. They are somewhat mathematical but the examples and explanations given are really good (imo).

Rohan Jagtap, “Understanding Markov Decision Process (MDP)”, towardsdatascience.com, (September, 2020). Providing a intuitive as well as mathematical treatment of value iteration. For the mathematically inclined this is a very good treatment of the topic.

A. Aylin Tokuc, “Value iteration vs Policy iteration in Reinforcement learning”, (October 2021). Provides a nice and comprehensible overview of value and policy iteration.

Paul A. Gagniuc, “Markov Chains – From theory to implementation and experimentation“, Wiley, (2017). Lots of great examples of applied Markov Chains.

Richard J. Boucherie & Nico M. van Dijk, “Markov Decision Processes in Practice“, Springer (2017). Great book with many examples of MDP implementations.

Ankur Ankan & Abinash Pranda, “Hands -on Markov Models with Python“, Packt (2018). Provides many good ideas and inspiration of how to code Markov chains in Python.

Brian Hayes, “First Links in the Markov Chain”, American Scientist (March-April 2014). Provides a very easy to read and interesting account of Markov Chains.

Kim Kyllesbech Larsen, “MarkovChains-and-MDPs“, The Python code used for all examples in this blog, (December 2021). The link does require you to have a Github account.