0

I am brand new to WEKA and ML, so please excuse my ignorance with the following. I've wasted several hours trying to figure it out, so hopefully someone could point me in the right direction:

I am trying to run a J48 decision tree on data for USDJPY. The data was loaded via .csv file and the class value is of nominal type, more specifically a value of TRUE or FALSE if USDJPY was trading more than 1% higher after 20 sessions. The problem is, When I run the algorithm, the decision tree is simply using the class value to solve the problem, which is useless. There are *22 attributes other than the class attribute from which I am looking to predict the class attribute.

When comparing my dataset to the example "glass" dataset, I cannot find any difference between the two that would explain my problem. "glass.arff" works as expected when I run J48 (with identical settings) by trying to predict the class value (type of glass) via the other attributes (ie it gets some guesses wrong).

What am I missing here? here is a list of the attributes:

@ATTRIBUTE date NUMERIC
@ATTRIBUTE open NUMERIC
@ATTRIBUTE high NUMERIC
@ATTRIBUTE low NUMERIC
@ATTRIBUTE close NUMERIC
@ATTRIBUTE 1daypctchg NUMERIC
@ATTRIBUTE smavg50onclose NUMERIC
@ATTRIBUTE smavg100onclose NUMERIC
@ATTRIBUTE smavg200onclose NUMERIC
@ATTRIBUTE ubb2 NUMERIC
@ATTRIBUTE bollma2 onclose NUMERIC
@ATTRIBUTE lbb2 NUMERIC
@ATTRIBUTE bollwjpybgn NUMERIC
@ATTRIBUTE %bjpybgn NUMERIC
@ATTRIBUTE rsi NUMERIC
@ATTRIBUTE ma50>100 {FALSE,TRUE}
@ATTRIBUTE ma50>200 {FALSE,TRUE}
@ATTRIBUTE ma100>200 {FALSE,TRUE}
@ATTRIBUTE up1pct5d? {FALSE,TRUE}
@ATTRIBUTE up1pct20d? {FALSE,TRUE}
@ATTRIBUTE dwn1pct5d? {FALSE,TRUE}
@ATTRIBUTE dwn1pct20d? {FALSE,TRUE}
trock2000
  • 302
  • 4
  • 13
  • Are you using the Weka UI or the Java API? – stackoverflowuser2010 Sep 17 '16 at 18:31
  • I am using the Weka UI – trock2000 Sep 17 '16 at 19:09
  • Are you marking the class column as the class in the UI? That will make the algorithm avoid using the class as a feature. – stackoverflowuser2010 Sep 17 '16 at 19:13
  • how do I do that? I thought that the last (right-most) column in your dataset defaults to the class? I also confirmed the right most column is bold in the preview window (if that means anything) - I even tried changing the class via the drop down menu in the preprocess and classify tabs - am I missing something? – trock2000 Sep 17 '16 at 19:33
  • Yes, the right-most column should be the class. If you followed all the steps to identify the correct column for the class, then I don't know what the problem is. Can you provide a link to the dataset? – stackoverflowuser2010 Sep 17 '16 at 21:59
  • Also, what do you mean by "When I run the algorithm, the decision tree is simply using the class value to solve the problem"? Do you mean that the classification result has 100% accuracy, or that the tree representation in the J48 output shows that the top decision node is the class value? – stackoverflowuser2010 Sep 17 '16 at 22:00
  • I dont have a link unfortunately - it is 45 years of daily data for USDJPY downloaded from Bloomberg into a csv file - how else can I share it with you directly? – trock2000 Sep 17 '16 at 22:15
  • Can you post like 20 lines of the data and say what the columns are? – stackoverflowuser2010 Sep 17 '16 at 22:21
  • sure but where can I do that here? to big/not formatted for comment section..... – trock2000 Sep 17 '16 at 22:23
  • Just add it to the bottom of your main question and apply "Code sample" formatting. You said it was just 7 attributes, right? – stackoverflowuser2010 Sep 17 '16 at 22:26
  • I now have a big clue as to what the issue is now - my .csv actually had 22 attributes (it had seven earlier, sorry for the confusiton) - so I cut it down to 7 to post for you here, but ran it again on 7 first and it worked correctly on that new file! shouldn't I be able to use 22 attributes though? – trock2000 Sep 17 '16 at 22:36
  • It should be able to use any number of attributes as long as the class is consistently in the same column. – stackoverflowuser2010 Sep 17 '16 at 22:37
  • If my answer helps solve your problem, please accept it below. – stackoverflowuser2010 Sep 17 '16 at 22:40
  • what do you mean exactly by "as long as the class is consistently in the same column"? it still isnt working - the formatting was off when I tried to post the dataset header, but I did add the attribute list in .arff format - can you see anything off by chance? – trock2000 Sep 17 '16 at 22:44
  • You said you had CSV data. Make sure all rows have the same number of fields, and make sure that the class is always in the last column. – stackoverflowuser2010 Sep 17 '16 at 22:49
  • Also, you may want to remove unusual characters from your attribute names, like ">" or "%" or "?", etc. – stackoverflowuser2010 Sep 17 '16 at 23:00

1 Answers1

1

Weka (and its J48 implementation) should be able to classify your data as long as the ground-truth class is consistently in the same column of your .csv file.

stackoverflowuser2010
  • 38,621
  • 48
  • 169
  • 217