Hidden Markov Models for gene finding (caricature)
Most current gene finding programs are based on Hidden
Markov Models. These work as follows: assume (wrongly)
that the DNA-sequence has been generated randomly by a
Markov model that can be in one of two states: “gene” or
“intergenic region.” Each state has a characteristic
probability of “emitting” a given nucleotide, and has a
characteristic (low) probability of switching to the other
state. The observer sees the sequence of emissions
(nucleotides), but the information by which state a given
nucleotide was emitted is hidden from the observer.