# [ml] Hidden Markov Models

Josh Myer josh at joshisanerd.com
Tue Jul 13 18:00:15 PDT 2010

```On Tue, Jul 13, 2010 at 2:16 PM, Glen Jarvis <glen at glenjarvis.com> wrote:

> I was invited to this list for questions I have about Hidden Markov Models.
> Please forgive the extra chatter.
>
>
This is exactly what this list is for!

"A HMM can be considered as the simplest dynamic Bayesian network."

Eek!  That's an awful place to start from, unless you already know Bayesian
networks (if you do, I have a very dense 600 page book I'd love to get an
explanation of).  It's like saying a NOT gate is the simplest quantum
tunneling gate, or something similarly startlingly "This is the simplest
example of a [insert family of obscure and generally brain-fucking objects
here]."  Bayes nets are nasty awful things to implement, and HMMs are
incredibly much simpler.

Stick to the "Concrete Example" on wikipedia.  I did a presentation on it in
the last ML incarnation; it might have been one of the things we programmed
our way through.  Here's that presentation as a correspondence course:

There are two big conceptual hurdles to understanding HMMs:

1) the hidden part (which is always poorly explained, because it's
impossible to explain in a straightforward and concise way, and academics
choose concise), and

2) the dynamic programming way (which is typically poorly explained, because
it's difficult to explain in a straightforward and concise way, and

Load up the "Concrete Example" section of wikipedia's HMM page.  Do not
click any other links, especially not the Markov Model page, as it's obtuse
and useless to a beginner.  Start with writing something that generates
random observations from the underlying model, to get a feel for what an HMM
is modeling.  It's not intuitive at first, and then it suddenly clicks and
you get it.  You will then forget, and repeat the moment of enlightenment
several times as you see different confusing aspects of the model (unless
you've been thinking probabilistically for a long time).

Once you get the idea of how the HMM is "hiding" its state, and have an
intuitive feeling for how the combined probabilities work on the generation
side, you can step back to conditional probabilities bit.  At that point,
you want to make a big hidden*observed matrix on a piece of paper, draw a
toy model on another piece of paper.  Make the probabilities friendly.  From
there, start at the very beginning of the piece of paper, and do what your
software did above.  Do this for 4 or 5 columns, which is a lot of time
spent copying numbers and feeling how the cells interconnect.  As you fill
in each cell, draw arrows between cells if you need to, and say out loud
what's happening.  Do this in the privacy of your own home.  But, really, do
talk yourself through it, it's very helpful.

You can implement this process (it's very computationally expensive, but
that doesn't matter for our toy problems), and have one thing that
implements the use of HMMs to explain a sequence of observations.  It's only
useful for pedagogy, so, write it, commit it to your dvcs of choice, and

Go read up on Levenshtein distance.
http://en.wikipedia.org/wiki/Levenshtein_distance -- this is dynamic
programming.  It's the class of things Viterbi fits into.  Once you have
implemented this and it works, make sure you understand how dynamic
programming worked there.

Next, go to the wikipedia page for the viterbi algorithm.  It's a dynamic
algorithm that has a bunch of cells, much like levenshtein, but, this time,
you're keeping track of each transition (so you have the whole string).  In
Levenshtein, this is equivalent to remembering the sequence of commands you
used to edit the string.  In the cells, instead of an edit distance, you
have the probability that you went into that state.  Unlike Levenshein, this
path isn't necessarily a connected sequence of cells (you can transition out
to any of the next states at each point; I'm pretty sure this actually a
higher-dimensional walk and they are therefore adjacent in the next
dimension, but haven't bothered to think about it).  Anyway, you've got
cells that get highlighted one per column, as in viterbi, which show the
most probable explanation of how you got to that observation.

```