0

I have some problems with the RWeka package of R, more precisely with the rule-learning-algorithms. I have created an .arff file by my own, which you can see below. Now I have run the JRip and J48 algorithm of the RWeka package with the data of the .arff file and got the following rules:

> JRip(Failure ~., data=date)
JRIP rules:
===========

 => Failure=no (35.0/11.0)

Number of Rules : 1

> J48(Failure ~., data=date)
J48 pruned tree
------------------
: no (35.0/11.0)

Number of Leaves  :     1

Size of the tree :      1

So now my question is why do the algorithms don't find a rule which is based on the date of production? Cause its obvious that all product which were produced at 2013-04-01 are faulty.

What is my mistake here?

Thanks in advance! titus24

@RELATION dataset

@ATTRIBUTE Date-of-Production   DATE "yyyy-MM-dd HH:mm:ss"
@ATTRIBUTE Location    {Frankfurt, Cologne, Hamburg, Munich, Berlin}
@ATTRIBUTE Failure    {yes, no}

@DATA
"2013-04-01 00:00:00",Frankfurt,yes
"2013-04-01 00:00:00",Cologne,yes
"2013-04-01 00:00:00",Munich,yes
"2013-04-01 00:00:00",Hamburg,yes
"2013-04-01 00:00:00",Berlin,yes
"2013-04-01 00:00:00",Frankfurt,yes
"2013-04-01 00:00:00",Cologne,yes
"2013-04-01 00:00:00",Munich,yes
"2013-04-01 00:00:00",Hamburg,yes
"2013-04-01 00:00:00",Berlin,yes
"2013-04-01 00:00:00",Frankfurt,yes
"2012-05-01 00:00:00",Cologne,no
"2012-05-02 00:00:00",Munich,no
"2012-05-03 00:00:00",Hamburg,no
"2012-05-04 00:00:00",Berlin,no
"2012-05-05 00:00:00",Frankfurt,no
"2012-05-06 00:00:00",Cologne,no
"2012-05-07 00:00:00",Munich,no
"2012-05-08 00:00:00",Hamburg,no
"2012-05-09 00:00:00",Berlin,no
"2012-05-10 00:00:00",Frankfurt,no
"2012-05-11 00:00:00",Cologne,no
"2012-05-12 00:00:00",Munich,no
"2012-05-13 00:00:00",Hamburg,no
"2012-05-14 00:00:00",Berlin,no
"2012-05-15 00:00:00",Frankfurt,no
"2012-05-16 00:00:00",Cologne,no
"2012-05-17 00:00:00",Munich,no
"2012-05-18 00:00:00",Hamburg,no
"2012-05-19 00:00:00",Berlin,no
"2012-05-20 00:00:00",Frankfurt,no
"2012-05-21 00:00:00",Cologne,no
"2012-05-22 00:00:00",Munich,no
"2012-05-23 00:00:00",Hamburg,no
"2012-05-24 00:00:00",Berlin,no
titus24
  • 25
  • 3

1 Answers1

0

Explanation

Internal representation of dates in WEKA for attributes is floating-point number storing the milliseconds since January 1, 1970, 00:00:00 GMT. As stated in weka.core.Attribute documentation. There is some kind of problem in convertion from POSIXct/POSIXt to floating-point number in RWeka.

Solution

Convert dates manually and run classification:

dataset <- read.arff("date.arff")
dataset[,1] <- unclass(dataset[, 1])   # get internal representation
J48(Failure ~ ., data = dataset)

Output is the same as in WEKA Explorer 3.7.12:

Date-of-Production <= 1337810400: no (24.0)
Date-of-Production > 1337810400: yes (11.0)

Number of Leaves  :     2

Size of the tree :  3
quepas
  • 956
  • 1
  • 13
  • 30