Home    Personal    Work    Computers    Miscellaneous

Research    Curriculum Vitae    PhD thesis

Finite Markov Models

Note: I have moved anthony.liekens.net to a new machine recently, and there are some problems with dvi2png and such. As a result, the latex formulas in this web page are broken until I find a way to get dvipng compiled on our brand new openbsd machine

Abstract

Finite Markov models can be used to model stochastic transitions in finite state spaces. This page gives an introduction of these models, using an example of a very simplistic weather model. With this model, we try to predict tomorrow's weather stochastically, based on today's weather, and observed expectations of transitions between sunny and rainy days. We can also easily study the average weather according to our observations, by determining the eigenstate of the system. The model and methods described here can easily be extended to complex stochastic systems with many states.

The weather model

States

In our very simple model, we assume that on a given day, it either rains, or it is sunny. As such, for each day, we can be in either state $s_{sun}$? or $s_{rain}$?. As our state space $S=\left\{s_{sun},s_{rain}\right\}$? consists of a finite number of states (i.e. S|=2$), the Markov model we're building here is called a finite Markov model.

Modeling observations

Suppose that we have learned from observations that if today's weather is sunny, it'll rain tomorrow with a 60% chance. The probability that it is sunny two days in a row is thus 40%. More formally, we can write this down as $P[s_{sun}\rightarrow s_{rain}]=0.6$? and $P[s_{sun}\rightarrow s_{sun}]=0.4$?. Similarly, we have observed that $P[s_{rain}\rightarrow s_{rain}]=0.7$? and $P[s_{rain}\rightarrow s_{sun}]=0.3$?. These transition probabilitiess are a mapping $S\times S\rightarrow\mathbb{R}$?.

We already get an intuitive feeling that our weather model on average will not predict good weather, as the probabilities of having a rainy day is very high. A proper analysis of the Markov model we're about to build will show how much the sun shines and how much it rains on average, based on these transitional observations.

Building a probability transition matrix

We can summarize these transition probabilities in one matrix, called a probability transition matrix $Q$?. In this matrix, $Q_{j,i}$? gives the probability that we move from state i to state j. For our weather model, the transition probability matrix thus becomes \Q=\left.\?

The probabilities for sunny state are denoted by the first row and column, whereas the probabilities for the rainy state are in the second row and column. Note that the sum of the probabilities in each column equals one, as the column denotes the distribution of the states at the next day, given today's weather.

Transient behavior of the model

Probability distributions

An expectation of our weather model can now be represented as a probability distribution over our two states. Let column vector $\rho=\left^T$? denote such a vector, with _i$? denoting the probability of being in state $s_i$?. Note that since the model represents a distribution, $\sum_{i\in S}p_i=1$? must always hold.

Iterating the model

If $\rho_t$? is such a vector describing the weather on day t, than we can compute the distribution $\rho_{t+1}$? over the states for day t+1 by multiplying the transition matrix with this vector, with $\rho_{t+1}=Q\rho_{t}$?. Similarly, the probability distribution of the weather on day t+k can be computed with $\rho_{t+k}=Q^k\rho_{t}$?.

As an example, suppose that it rains today. This can be represented with vector $\rho_0=^T$?. We can now compute the expectations for tomorrow's weather with our model with $\rho_1=Q\rho_0$?, and the weather in k days can be computed with $\rho_k=Q^k\rho_0$?.

As such, the expectations for rain and sun will be like the following: \\rho_0=\left\rightarrow\rho_1=\left\rightarrow\rho_2=\left\cdots\?

We can use these steps of the iteration process to predict the weather in the coming days. As such, if it rains tomorrow, there's a 70% chance of rain for tomorrow, and a 67% chance of rain the day after tomorrow. Of course, this is based on the observations we made when creating the Markov chain's probability transition matrix. If different observations are implemented in the matrix, the predictions of future weather obviously depends on this.

We can build a similar chain if we start with a sunny day: \\rho_0=\left\rightarrow\rho_1=\left\rightarrow\rho_2=\left\cdots\?

As a result, you can study the transient behavior of the model by simply iterating the multiplication of the transition probability matrix with an initial distribution vector.

Limiting behavior

As you iterate the model for a couple of generations, the predictions of the model become similar. In our model, this average behavior shows up quickly since the redictions for sunny and rainy weather are similar. If we iterate long enough (e.g. an infinite number of days), we see that the system converges to a distribution $\rho_\infty=\lim_{t\rightarrow\infty}\rho_t=\lim_{t\rightarrow\infty}Q^t\rho_0=\left^T$?. This observation is independent of the initial distribution (e.g. whether we start with a rainy or sunny day).

A couple of sections ago, we had an intuition that the model would predict a lot of bad weather on average. The limiting behavior of our model indeed shows that this intuition is correct. If a simulation of the model is being run, you will see that, on average, it is rainy on twice the number of sunny days, if we run the model long enough. This is true because of the weak law of large numbers.

By studying the model, and by using some long existing theorems in discrete mathematics, we can compute this limiting behavior from the transition probability matrix Q.

Fixed point distribution

Note that $\rho_\infty$? is a fixed point of the system, with $\rho_\infty=Q\rho_\infty$?. In discrete mathematics, this fixed point distribution is called a stochastic eigenvector of $Q$? with associated eigenvalue 1. We will use a well known theorem to show that this behavior is unique if some properties of the transition matrix hold.

The Perron-Frobenius Theorem

This theorem from discrete mathematics shows that if transition probability matrix Q is ergodic, then there exists a unique stochastic eigenvector of this matrix with corresponding eigenvalue such that this eigenvector is the fixed point or limiting distribution over the states of the system, independent of the initial distribution. Also, the theorem shows there is no eigenvalue of the system larger than 1.

This theorem very much applies to the limiting behavior of the stochastic systems we can build with finite Markov models.

Ergodicity

A Markov model's transition matrix is ergodic if two properties of the transition matrix can be shown to be true:

  • irreducibility: there must be a strictly positive probability to travel from any state in the system to any other state in the system, in a finite number of steps
  • aperiodicity: no part of the transition matrix may cause periodic jumping between states. As an example, transition matrix $\left$? is a matrix that is irreducible, as all states can be reached in a finite number of steps, but it is periodic as the system will constantly jump from one state to the other. No unique eigenvector with eigenvalue 1 of this system thus exist that would give us the limiting or fixed point behavior

Theorems exist that can simplify the discovery of these properties of new transition matrices you may be building. A very helpful theorem states that if the system is irreducible, and at least on of the elements on the diagonal of the transition matrix is strictly positive, than the whole transition matrix is aperiodic (and as a result, ergodic). In a similar context, if it can be shown for a transition matrix that all of its elements are strictly positive, than the matrix is irreducible and aperiodic, and thus ergodic.

Summary

This page introduces a couple of basic concepts of finite Markov models, using a very simple examplary state space. Of course, the model can be extended to have a lot more states (and thus, a lot more transition probabilities). The methodology used for bigger matrices is the same for the small matrices we've been using in this page. Transient behavior can be studied by multiplying the transition matrix with an initial distribution over the states, and the limiting or fixed point distribution of ergodic Markov models can be determined by computing the system's unique stochastic eigenvector with eigenvalue 1.

Further reading

Edit (locked) - History - Recent Changes - Search - Statistics
All contents copyrighted 2000-2006 Anthony Liekens unless otherwise noted. Powered by PmWiki
Page last modified on October 14, 2005, at 01:45 PM.