I wanna classify Movielense users table demographic data but the result of J48 is weird, I classify my data with C5.0 and every thing was fine But I must work on this algorithm (j48)
structure of my data is like below
$ user_id : int 1 2 3 4 5 6 7 8 9 10 ...
$ age : Factor w/ 7 levels "1","18","25",..: 1 7 3 5 3 6 4 3 3 4 ...
$ occupation: Factor w/ 21 levels "0","1","2","3",..: 11 17 16 8 21 10 2 13 18 2 ...
$ gender : Factor w/ 2 levels "F","M": 1 2 2 2 2 1 2 2 2 1 ...
$ Class : Factor w/ 4 levels "1","2","3","4": 2 2 2 2 3 2 2 2 2 4 ...
and head of data is
head(data)
user_id age occupation gender Class
1 1 1 10 F 2
2 2 56 16 M 2
3 3 25 15 M 2
4 4 45 7 M 2
5 5 25 20 M 3
6 6 50 9 F 2
all column except user_id
are nominal type
and should be factor in R
Code for classification:
library(RWeka)
fit <- J48(data$Class~., data=data[,-c(1)], control = Weka_control(C=0.25))
currentUserClass = predict(fit,data[,-c(1)])
table(currentUserClass , data$Class)
and wrong table of summary result is
currentUserClass 1 2 3 4
1 0 0 0 0
2 216 3630 1549 645
3 0 0 0 0
4 0 0 0 0
When I fit my model with C5.0 result is like below that I except from both algorithm
predictions 1 2 3 4
1 216 0 0 0
2 0 3630 0 0
3 0 0 1549 0
4 0 0 0 645
More Try
- I change the structure of my data and convert my factor columns to separate columns and nothing changes
- I change
C controller value
the result goes a little better inC=0.75
but It's totally wrong
event after normalization and changing data nothing happened
> head(data)
user_id age1 age18 age25 age35 age45 age50
1 1 5.1188737 -0.4726289 -0.7289391 -0.4960755 -0.3164894 -0.2990841
2 2 -0.1953231 -0.4726289 -0.7289391 -0.4960755 -0.3164894 -0.2990841
3 3 -0.1953231 -0.4726289 1.3716296 -0.4960755 -0.3164894 -0.2990841
4 4 -0.1953231 -0.4726289 -0.7289391 -0.4960755 3.1591400 -0.2990841
5 5 -0.1953231 -0.4726289 1.3716296 -0.4960755 -0.3164894 -0.2990841
6 6 -0.1953231 -0.4726289 -0.7289391 -0.4960755 -0.3164894 3.3429880
age56 occupation1 occupation2 occupation3 occupation4 occupation5
1 -0.2590882 -0.3094756 -0.2150398 -0.1717035 -0.3790765 -0.1374418
2 3.8590505 -0.3094756 -0.2150398 -0.1717035 -0.3790765 -0.1374418
3 -0.2590882 -0.3094756 -0.2150398 -0.1717035 -0.3790765 -0.1374418
4 -0.2590882 -0.3094756 -0.2150398 -0.1717035 -0.3790765 -0.1374418
5 -0.2590882 -0.3094756 -0.2150398 -0.1717035 -0.3790765 -0.1374418
6 -0.2590882 -0.3094756 -0.2150398 -0.1717035 -0.3790765 -0.1374418
occupation6 occupation7 occupation8 occupation9 occupation10 occupation11
1 -0.2016306 -0.3558574 -0.05312294 -0.1243576 5.4744311 -0.1477163
2 -0.2016306 -0.3558574 -0.05312294 -0.1243576 -0.1826371 -0.1477163
3 -0.2016306 -0.3558574 -0.05312294 -0.1243576 -0.1826371 -0.1477163
4 -0.2016306 2.8096490 -0.05312294 -0.1243576 -0.1826371 -0.1477163
5 -0.2016306 -0.3558574 -0.05312294 -0.1243576 -0.1826371 -0.1477163
6 -0.2016306 -0.3558574 -0.05312294 8.0399919 -0.1826371 -0.1477163
occupation12 occupation13 occupation14 occupation15 occupation16 occupation17
1 -0.2619865 -0.1551514 -0.2293967 -0.1562667 -0.2038431 -0.3010506
2 -0.2619865 -0.1551514 -0.2293967 -0.1562667 4.9049217 -0.3010506
3 -0.2619865 -0.1551514 -0.2293967 6.3982549 -0.2038431 -0.3010506
4 -0.2619865 -0.1551514 -0.2293967 -0.1562667 -0.2038431 -0.3010506
5 -0.2619865 -0.1551514 -0.2293967 -0.1562667 -0.2038431 -0.3010506
6 -0.2619865 -0.1551514 -0.2293967 -0.1562667 -0.2038431 -0.3010506
occupation18 occupation19 occupation20 genderM Class
1 -0.1082744 -0.1098287 -0.2208735 -1.5917949 2
2 -0.1082744 -0.1098287 -0.2208735 0.6281176 2
3 -0.1082744 -0.1098287 -0.2208735 0.6281176 2
4 -0.1082744 -0.1098287 -0.2208735 0.6281176 2
5 -0.1082744 -0.1098287 4.5267283 0.6281176 3
6 -0.1082744 -0.1098287 -0.2208735 -1.5917949 2
> fit <- J48(data$Class~., data=data, control = Weka_control(C=0.25))
> currentUserClass = predict(fit,data)
> table(currentUserClass , data$Class)
currentUserClass 1 2 3 4
1 7 1 2 2
2 201 3601 1470 617
3 8 28 75 14
4 0 0 2 12