0

I have a data.frame with a column with values ranging from 0 to 50.000. I want to create create 5 categories for this data in order to plot it in a categorized histogram.

What I want to do is to create a column that tells me which is the category this value belongs in order to plot it. For instance I decided to create the following categories: [1,3] (3,6] (6,12] (12,30] (30,50000]

Is this possible? There is an easier way to do that? I normally use ggplot2 library for the plots.

Thanks in advance.

biojl
  • 1,060
  • 1
  • 8
  • 26
  • Use `cut` to do exactly this. – Andrie Apr 27 '12 at 13:36
  • 1
    possible duplicate of [R adding column which contains bin value of another column](http://stackoverflow.com/questions/5570293/r-adding-column-which-contains-bin-value-of-another-column) – Andrie Apr 27 '12 at 13:37
  • 1
    @biojl Here are some previous answers of this same question: [Create categorical variable in R based on range](http://stackoverflow.com/questions/2647639/create-categorical-variable-in-r-based-on-range) – Ari B. Friedman Apr 27 '12 at 13:43

1 Answers1

1

See ?cut. Here is an example:

set.seed(42)
dat <- data.frame(Values = sample.int(50000, size = 100))
## create factor indicating which categoriesy data are in    
grps <- with(dat, cut(Values, breaks = c(1,3,6,12,30,50000)))

This gives:

> head(grps)
[1] (30,5e+04] (30,5e+04] (30,5e+04] (30,5e+04] (30,5e+04] (30,5e+04]
Levels: (1,3] (3,6] (6,12] (12,30] (30,5e+04]
> table(grps)
grps
     (1,3]      (3,6]     (6,12]    (12,30] (30,5e+04] 
         0          0          1          0         99

If you want that in the data frame, try this instead:

dat2 <- within(dat, Groups <- cut(Values, breaks = c(1,3,6,12,30,50000)))

Which results in

> head(dat2)
  Values     Groups
1  45741 (30,5e+04]
2  46853 (30,5e+04]
3  14307 (30,5e+04]
4  41520 (30,5e+04]
5  32085 (30,5e+04]
6  25953 (30,5e+04]

You can change the levels of the resulting factors if you want to give different labels to them.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453