0

I was trying to convert some continuous integers to categorical ranges, but something I did not understand happened. Although I fixed to get what I want, I still don't understand why it happened.

The variable is some integers from 0 to 12, the following code left 10,11,12 out from the 5+ category.

py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==0]<-"0"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==1]<-"1"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==2]<-"2"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==3]<-"3"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==4]<-"4"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain>=5]<-"5+"
py2$Daily.Whole.Grain<-as.factor(py2$Daily.Whole.Grain)

But when I change the order of conversion, it includes 10,11,12.

py2$Daily.Whole.Grain[py2$Daily.Whole.Grain>=5]<-"5+"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==0]<-"0"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==1]<-"1"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==2]<-"2"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==3]<-"3"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==4]<-"4"

Can anyone explain it, why it leaves double digits integers out? Thanks very much.

Cath
  • 23,906
  • 5
  • 52
  • 86
user2935184
  • 113
  • 4
  • 11
  • 3
    you are changing your variable into `character` and you can check for `"10" > "5"` it will give `FALSE`, hence the absence of `10`, `11` and `12` (but `52` would be included). The best would be to create another variable instead of modifying the existing one (and you can avoid doing it in 6 lines) or you can use `as.integer` if you really want to modify your variable – Cath Mar 24 '15 at 14:42
  • 1
    actually, you can just do `py2$Daily.Whole.Grain[py2$Daily.Whole.Grain>=5]<-"5+"` to get what you want as the other value are just converted to character – Cath Mar 24 '15 at 14:46

1 Answers1

1

As @CathG mentioned, the problem is due to converting the column from a numeric class to character. Here is perhaps a better solution using the cut function which will give you factors based on cut-points of a variable:

py2 <- data.frame(Daily.Whole.Grain = 1:10)
py2$Daily.Whole.Grain1 <- cut(py2$Daily.Whole.Grain, 
    breaks = c(1:5, Inf), right = FALSE, labels = c(1:4, "5+"))
py2
   Daily.Whole.Grain Daily.Whole.Grain1
1                  1                  1
2                  2                  2
3                  3                  3
4                  4                  4
5                  5                 5+
6                  6                 5+
7                  7                 5+
8                  8                 5+
9                  9                 5+
10                10                 5+
Jeff
  • 718
  • 8
  • 20
  • Thanks Jeff, I did not want to use the cut function because I wanted nicer categorical names. @CathG's 2nd comment really helpes me understand what was going on. Thank you both. – user2935184 Mar 24 '15 at 15:00
  • You are welcome. You could also add `labels = c(1:4, "5+")` to the `cut` function to get what you want, but @CathG's solution is a bit shorter! (I've edited my answer to have this). – Jeff Mar 24 '15 at 15:10