0

I am trying to fit and train a HMM using Hmmlearn, however I get this weird warning that I don't fully understand:

Fitting a model with 117917879 free scalar parameters with only 550034 data points will result in a degenerate solution.

I use quite a large dataset, but I don't understand where the 117917879 free scalar parameters come from, and what it means to have a degenerate solution.

I define my hmm as follows:

from hmmlearn import hmm

# vocab_size = 10858, is the number of states
model = hmm.GaussianHMM(n_components=vocab_size, covariance_type="full")

# frequency_list = list of length 1058, containing the initial probability of each state
model.start_prob_ = np.array(frequency_list)

# transitions is a (10858, 10858) containing the transition probabilities                            
model.transmat_ = np.array(transitions)                                       

# integer_array = My data converted to an array (size = 550034)
integer_array = integer_array.reshape(-1,1)
model.fit(integer_array)

Could anyone help me improve, or at least explain where the scalar parameters come from, and what a degenerate solution is?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
WibeMan
  • 91
  • 7
  • 1
    I think that your 117,917,879 free scalar parameters come from the transition matrix, having 10858x10858 (technically 10859 elements on each axis if zero-indexed). 10859^2 gives you 117,917,881, which is 2 out but seems likely to have something to do with it. Can't help on the degenerate solution I'm afraid, but at a guess is it saying that your 550034 can't model all 117,917,881 transitions? – djjavo Jul 16 '20 at 09:33

1 Answers1

0

Is because there is about N^2 states in any give N-state HMM. What a this means is that there is multiple solutions.

Think about a+bx = y, if you have two points (x1, y1) and (x2, y2) then you can find an answer for a and b, but if you only have one point (x1, y1) then a and b will have infinite number of solutions. This is what is meant by a degenerate solution.

luis
  • 1
  • 1