4

I am using weka.classifiers.bayes.HMM to try to classify some of my data, but I can't seem to find any examples of exactly how my ARFF file should look like...the documentation wasn't really clear to me.

So I understand that HMMs require time-series data, my question is how to represent that in my dataset? Am I supposed to add another "Numerical" index in front of each features line? For example, here are 3 of my feature lines (there are 10s of thousands total but all follow this format):

2,2.217950,2.235440,0.031252,2.224833,2.301141,0.093227,1.940765,1.973835,0.064434,1 2,2.216870,2.235608,0.035570,2.217950,2.235440,0.031252,2.023161,2.531513,0.623939,1 2,2.216577,2.246109,0.045806,2.216870,2.235608,0.035570,2.497010,2.529199,0.050049,1

Each line contains several energy readings and they are all listed in sequential order: 1st line came first, 2nd line came 1 second after, 3rd line, 1 second after second line's reading etc.

How do I use HMM in Weka to train on this set? (Yes I know I'll need a separate test dataset that's also a timeseries)

Thanks!!

stellarowl12
  • 525
  • 6
  • 18
  • I forgot to mention that I need 5 states: 1, 2, 3, 4, 5. And they should go from one to another with certain probabilities. For example, if a current time slot is in state 3, it is more likely to go to states 2 and 4 than 1 and 5. The states are at the very last line (edited below as an example): 2,2.217950,2.235440,0.031252,2.224833,2.301141,0.093227,1.940765,1.973835,0.064434,1 2,2.216870,2.235608,0.035570,2.217950,2.235440,0.031252,2.023161,2.531513,0.623939,2 2,2.216577,2.246109,0.045806,2.216870,2.235608,0.035570,2.497010,2.529199,0.050049,3 – stellarowl12 Jun 22 '13 at 18:26

2 Answers2

3

From the HMMweka homepage:

The HMM classifier only work on sequence data, which in Weka is represented as a relational attribute. Data instances must have a single, Nominal, class attribute and a single, relational, sequence attribute [...]

ivcandela
  • 331
  • 3
  • 12
2

I've had the same problem, and am likewise new so any corrections will be much appreciated, but here's what I've figured out.

There is an example in the download that is useful, specifically the numericsequence.arff The format you want works like this:

@relation relation_name
@attribute name_of_instance_attribute {instance_0,instance_1,...instance_n}
@attribute class {relation_type_0, relation_type_1, ... relation_type_n}
@attribute name_of_sequence relational 
  @attribute sequence_variable_0 type 
  @attribute sequence_variable_1 type
@end sequence_variable_1 type
@data

instance_0,relation_type_n,'5,6\n7,8\n9,10'
instance_1,relation_type_n,'2,3\n4,5\n6,7'

be sure if you are writing a program to write your arff that you insert "\n" instead of a line break it seems to want a line the literal '\n' and not a real line break.

Empiricist
  • 116
  • 2
  • 8