I want to implement a classic Markov model problem: Train MM to learn English text patterns, and use that to detect English text vs. random strings.
I decided to use hmmlearn
so I don't have to write my own. However I am confused about how to train it. It seems to require the number of components in the HMM, but what is a reasonable number for English? Also, can I not do a simple higher order Markov model instead of hidden? Presumably the interesting property is is patterns of ngrams, not hidden states.