7

I would like to add a column to my dataframe that contains categorical data based on numbers in another column. I found a similar question at Create categorical variable in R based on range, but the solution provided there didn't provide the solution that I need. Basically, I need a result like this:

x   group
3   0-5
4   0-5
6   6-10
12  > 10

The solutions suggested using cut() and shingle(), and while those are useful for dividing the data based on ranges, they do not create the new categorical column that I need.

I have also tried using something like (please don't laugh)

data$group <- "0-5"==data[data$x>0 & data$x<5, ]

but that of course didn't work. Does anyone know how I might do this correctly?

Community
  • 1
  • 1
Thomas
  • 2,484
  • 8
  • 30
  • 49

1 Answers1

25

Why didn't cut work? Did you not assign to a new column or something?

> data=data.frame(x=c(3,4,6,12))
> data$group = cut(data$x,c(0,5,10,15))
> data
   x   group
1  3   (0,5]
2  4   (0,5]
3  6  (5,10]
4 12 (10,15]

What you've created there is a factor object in a column of your data frame. The text displayed is the levels of the factor, and you can change them by assignment:

levels(data$group) = c("0-5","6-10",">10")
data
   x group
1  3   0-5
2  4   0-5
3  6  6-10
4 12   >10

Read some basic R docs on factors and you'll get it.

Spacedman
  • 92,590
  • 12
  • 140
  • 224
  • Spacedman -- thanks, your solution worked! Indeed I did not assign a new column. Do you know how I can get the categories to appear as "0-5" instead of "(0,5]"? – Thomas Jan 10 '14 at 17:04
  • @Thomas use `labels` argument, try this `cut(data$x,c(0,5,10,15), labels=c("0-5", "6-10", ">10"))` and take a look at `?cut`, read the documentation. – Jilber Urbina Jan 10 '14 at 17:08
  • Is there a way to change the class from logical to factor (so I can use this to group for plotting)? I've tried data$group <- factor(data$group), data$group <- as.factor(data$group), etc. but doesn't work. – Thomas Jan 10 '14 at 17:39
  • Huh? It is a factor. Logical has nothing to do with it. – Spacedman Jan 10 '14 at 17:43
  • 2
    Note please dont just say "i tried X and it didn't work" - show us your code, what you expected, why you think you didnt get what you expected - and if necessary start a new question or edit this one. – Spacedman Jan 10 '14 at 17:44
  • Thanks Spacedman. Sorry for the confusion. I was getting an error when using this new "group" with some packages, but fixed it by by using "data$group". If I included "group" in my input data file, I didn't have to use this, which is why I was confused. Also class(group) is logical whereas class(data$group) is a factor, which is why I was confused. Thanks again for all of your help, I wouldn't have solved this without your assistance. – Thomas Jan 10 '14 at 18:11