I am using a RcppHMM package to make a GHMM(Multivariate gaussian mixture HMM model) with continuous observation.
I want to learn an EM algorithm using continuous observations with different sequence lengths. To be specific, each observation has a different sequence length from 3 to 6. I tried to fit the model using the whole observation dataset at once (I made the dataset with ncol=6(maximum sequence length) and filled the empty part with all zero), but it didn't work so I separated observations as groups with the same lengths [O3, O4, O5, O6] and updated the model by each group.
Each observation group looks like this
O3
[,1] [,2] [,3]
[1,] 0.8550940 0.3231340 0.8639223
[2,] 0.4453262 0.5840305 0.4356958
[3,] 0.4344789 -1.2234760 0.4344789
[4,] -0.5003085 3.0322560 -0.5003085
[5,] -0.1459598 -0.4661041 -0.1459598
[6,] -0.1977263 -0.6352724 -0.1977263
O4
[,1] [,2] [,3] [,4]
[1,] 0.8965332 0.3338220 0.7270241 0.8824540
[2,] 0.4033438 0.4131293 0.1593136 0.4187023
[3,] -0.7329015 -1.6828296 -0.1550487 -0.1550487
[4,] -0.3213490 7.3449076 -0.2787857 -0.2787857
[5,] -0.2868067 -0.3743332 -0.1340566 -0.1340566
[6,] 2.6832742 -0.5844305 0.2320774 0.2320774
O5
[,1] [,2] [,3] [,4] [,5]
[1,] 0.83401341 0.2492370 0.47493190 0.6440035 0.84985396
[2,] 0.37988234 0.2335883 0.17043570 0.2116066 0.36260248
[3,] -0.05240445 -0.3034002 -0.05240445 -0.3034002 -0.05240445
[4,] -0.37240867 1.1500528 -0.37240867 1.1500528 -0.37240867
[5,] -0.02056839 0.9343497 -0.02056839 0.9343497 -0.02056839
[6,] -0.27586584 -0.4406833 -0.27586584 -0.4406833 -0.27586584
O6
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.9287066 0.35065802 0.4493442 0.6142040 0.7423286 0.9217381
[2,] 0.3852644 0.09612516 0.1623447 0.1320334 0.1875127 0.3928661
[3,] 0.1436024 -0.08326038 0.7800491 0.1436024 0.1926751 0.1436024
[4,] -0.4284304 -0.27916609 -0.5224586 -0.4284304 0.1267840 -0.4284304
[5,] -0.8846364 -0.81131525 -0.1781479 -0.8846364 -0.1266250 -0.8846364
[6,] -0.2141231 -0.78377461 -0.4440142 -0.2141231 -0.7888260 -0.2141231
nrow is the number of dimension of observation, and ncol is lengths of sequences.
When I updated the model with the first group that has sequence length 3, it operated. But when I tried to re-update model with second group that has sequence length 4, the warning message came out as below,
In learnEM(newModel, O4[, 1:4, ], iter = 20, delta = 1e-05, print = TRUE) : It is recommended to have a covariance matrix with a determinant bigger than 1/ ((2*PI)^k) .
Does anyone know how to fix this warning message? And is there any proper way to learn a EM algorithm with observations that have different sequence lengths using this package?