2

I am coming back to R after a year and want to use rpart for a classification tree.

My data looks like:

Category, Shape, Color, Yes, No
A, Square, Blue, 3, 2
B, Triangle, Blue, 2, 4
etc. 

Any recommendations to reshape into the below so I can use rpart? (I believe rpart needs the data as such)

ID, Shape, Color, Result
A, Square, Blue, Yes
A, Square, Blue, Yes
A, Square, Blue, Yes
A, Square, Blue, No
A, Square, Blue, No
B, Triangle, Green, Yes
etc...

Thank you!

As3adTintin
  • 2,406
  • 12
  • 33
  • 59

2 Answers2

2

You can using melt from reshape2 , then follow by rep

s=melt(df,id.var=c('Category','Shape','Color'))
s[ rep( 1:nrow(s) , s$value ),]
              Category     Shape Color variable value
1                    A    Square  Blue      Yes     3
1.1                  A    Square  Blue      Yes     3
1.2                  A    Square  Blue      Yes     3
2                    B  Triangle  Blue      Yes     2
2.1                  B  Triangle  Blue      Yes     2
3                    A    Square  Blue       No     2
3.1                  A    Square  Blue       No     2
4                    B  Triangle  Blue       No     4
4.1                  B  Triangle  Blue       No     4
4.2                  B  Triangle  Blue       No     4
4.3                  B  Triangle  Blue       No     4
BENY
  • 317,841
  • 20
  • 164
  • 234
1

melt the data into a long format, then repeat the variable the number of times they appear in the value column.

library(data.table)
melt(setDT(dat),1:3)[,rep(variable,value),by=.(Category,Shape,Color)]
            Category     Shape Color  V1
 1:                A    Square  Blue Yes
 2:                A    Square  Blue Yes
 3:                A    Square  Blue Yes
 4:                A    Square  Blue  No
 5:                A    Square  Blue  No
 6:                B  Triangle  Blue Yes
 7:                B  Triangle  Blue Yes
 8:                B  Triangle  Blue  No
 9:                B  Triangle  Blue  No
10:                B  Triangle  Blue  No
11:                B  Triangle  Blue  No

using:

library(tidyverse)

dat%>%
  rowwise()%>%
  mutate(var=list(rep(c("Yes","No"),c(Yes,No))))%>%
  select(-Yes,-No)%>%
  unnest()
 Category   Shape    Color var  
  <fct>    <fct>    <fct> <chr>
 1 A        Square   Blue  Yes  
 2 A        Square   Blue  Yes  
 3 A        Square   Blue  Yes  
 4 A        Square   Blue  No   
 5 A        Square   Blue  No   
 6 B        Triangle Blue  Yes  
 7 B        Triangle Blue  Yes  
 8 B        Triangle Blue  No   
 9 B        Triangle Blue  No   
10 B        Triangle Blue  No   
11 B        Triangle Blue  No   
Onyambu
  • 67,392
  • 3
  • 24
  • 53