0

I have this dataset:

"Density","bodyfat","Age","stato"
1.0708,12.3,23,atletico
1.0853,6.1,22,atletico
1.0414,25.3,22,sopraMedia
1.0751,10.4,26,atletico
1.0414,25.3,22,sopraMedia
1.0321,10.4,26,atletico
1.0561,25.3,22,sopraMedia
1.0752,3.1,26,pesoMinimo
1.0987,26.2,22,obeso
1.0654,15.4,26,buonoStato
1.0321,16.9,22,buonoStato
1.0451,10.4,26,atletico
1.0924,27.3,22,obeso
1.0461,1.4,26,pesoMinimo
1.0155,25.3,22,sopraMedia
1.0112,10.4,26,atletico
1.0785,3.3,22,pesoMinimo
1.0776,28.1,26,obeso

I want this rules of classification:

2 < bodyFat < 4 => pesoMinimo
6 < bodyFat < 13 => atletico
14 < bodyFat < 17 => buonoStato
18 < bodyFat < 25 => sopraMedia
bodyFat > 26 => obeso

The code that i used for classification is this:

library(party)

#read data file
mydata= read.csv("/home/Bodyfat.csv")

# Check attributes of data
str(mydata)


set.seed(1234)
ind <- sample(2,nrow(mydata),replace=TRUE, prob= c(0.7,0.3))
trainData <- mydata[ind==1,]
testData <- mydata[ind==2,]


myFormula <- stato ~ bodyfat
albero <- ctree(myFormula, data=trainData)
table(predict(albero),trainData$stato)

print(albero)
plot(albero)

testData <- data.frame(Density=1.0515,bodyfat=30.7,Age=30,stato="")
testPred <- predict(albero,newdata= testData)
table(testPred,testData$stato)

I obtain a result not good

First Prediction I obtain: table(predict(albero),trainData$stato)

                 atletico buonoStato obeso pesoMinimo sopraMedia
  atletico          5          2     3          2          3
  buonoStato        0          0     0          0          0
  obeso             0          0     0          0          0
  pesoMinimo        0          0     0          0          0
  sopraMedia        0          0     0          0          0

Second prediction I obtain:

> table(testPred,testData$stato)

 testPred      
   atletico   1
   buonoStato 0
   obeso      0
   pesoMinimo 0
   sopraMedia 0

But in the NewData i have bodyFat = 30.7 => "obeso" and not "atletico".

why it does not work correctly?

DPUT:

> dput(mydata)
structure(list(Density = c(1.0708, 1.0853, 1.0414, 1.0751, 1.0414, 
1.0321, 1.0561, 1.0752, 1.0987, 1.0654, 1.0321, 1.0451, 1.0924, 
1.0461, 1.0155, 1.0112, 1.0785, 1.0776), bodyfat = c(12.3, 6.1, 
25.3, 10.4, 25.3, 10.4, 25.3, 3.1, 26.2, 15.4, 16.9, 10.4, 27.3, 
1.4, 25.3, 10.4, 3.3, 28.1), Age = c(23L, 22L, 22L, 26L, 22L, 
26L, 22L, 26L, 22L, 26L, 22L, 26L, 22L, 26L, 22L, 26L, 22L, 26L
), stato = structure(c(1L, 1L, 5L, 1L, 5L, 1L, 5L, 4L, 3L, 2L, 
2L, 1L, 3L, 4L, 5L, 1L, 4L, 3L), .Label = c("atletico", "buonoStato", 
"obeso", "pesoMinimo", "sopraMedia"), class = "factor")), .Names = c("Density", 
"bodyfat", "Age", "stato"), class = "data.frame", row.names = c(NA, 
 -18L))
bomberdini
  • 129
  • 1
  • 12
  • 1
    and the question is... – R. Prost Jun 12 '18 at 07:53
  • @R.Prost why it does not work correctly? – bomberdini Jun 12 '18 at 08:10
  • if you already know the rules of classification you don't need to predict them (?) ... also please use `dput` on your dataset so we can easily paste it into R . I would also print the coefficients obtained (or tree) rather than just the resulting table. thanks – R. Prost Jun 12 '18 at 09:33
  • @R.Prost thank's for the answer. I would like to add that any "bodyFat" value always returns "atletico" so i think that predict not work. I insert dput – bomberdini Jun 12 '18 at 09:48
  • Hm from your first prediction and looking at the result from albero , your tree has no branches. (plot alberto). It classifies everything as altetico indeed. I am not an expert in trees I am afraid but you seem to have very little obersvations. If you double the number of observations (I just copied the same data over) then you start having some results. – R. Prost Jun 13 '18 at 07:02
  • with doubling the number of values in mydata (just duplicating the values) testPred gives atletico 0 buonoStato 0 obeso 1 pesoMinimo 0 sopraMedia 0 – R. Prost Jun 13 '18 at 07:04
  • @R.Prost Thank you very much. I try – bomberdini Jun 13 '18 at 07:06
  • If you _know_ what the tree is then you do not have to _learn_ it with `ctree`. Look at `vignette("partykit", package = "partykit")` for a worked example how to put together an exogenously given try and then print/plot/predict it. If you really want to _learn_ the tree then you either need a bigger sample or you need to switch off pre-pruning (that tries to guard you against overfitting). – Achim Zeileis Jun 13 '18 at 12:08

0 Answers0