11

I have a vector X that contains positive numbers that I want to bin/discretize. For this vector, I want the numbers [0, 10) to show up just as they exist in the vector, but numbers [10,∞) to be 10+.

I'm using:

x <- c(0,1,3,4,2,4,2,5,43,432,34,2,34,2,342,3,4,2)
binned.x <- as.factor(ifelse(x > 10,"10+",x))

but this feels klugey to me. Does anyone know a better solution or a different approach?

Jaap
  • 81,064
  • 34
  • 182
  • 193
mcpeterson
  • 4,894
  • 4
  • 24
  • 24
  • 1
    What's kludgy about that? It looks pretty neat to me. – Rob Hyndman Mar 24 '10 at 05:33
  • 2
    @Rob: The main drawback of this approach is that you don't get factor levels created for values that aren't there (e.g., for this data there is no level "6"). This could be fixed with explicit levels in the call to `factor`. – Richie Cotton Mar 24 '10 at 11:45

3 Answers3

22

How about cut:

binned.x <- cut(x, breaks = c(-1:9, Inf), labels = c(as.character(0:9), '10+'))

Which yields:

 # [1] 0   1   3   4   2   4   2   5   10+ 10+ 10+ 2   10+ 2   10+ 3   4   2  
 # Levels: 0 1 2 3 4 5 6 7 8 9 10+
Henrik
  • 65,555
  • 14
  • 143
  • 159
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 2
    That seems more complicated than the solution in the question. – Rob Hyndman Mar 24 '10 at 05:33
  • 1
    Minor improvements: Swap `1e6` with `Inf`. You don't need `include.lowest=TRUE`. (Compare answers by calling `table(binned.x)`. – Richie Cotton Mar 24 '10 at 11:43
  • @Rob: Yes, I can't say I disagree :) @Richie: Thanks! I'm still learning the language, so your "minor improvements" are a major help to me. – unutbu Mar 24 '10 at 13:42
7

You question is inconsistent.
In description 10 belongs to "10+" group, but in code 10 is separated level. If 10 should be in the "10+" group then you code should be

as.factor(ifelse(x >= 10,"10+",x))

In this case you could truncate data to 10 (if you don't want a factor):

pmin(x, 10)
# [1]  0  1  3  4  2  4  2  5 10 10 10  2 10  2 10  3  4  2 10
Marek
  • 49,472
  • 15
  • 99
  • 121
3
x[x>=10]<-"10+"

This will give you a vector of strings. You can use as.numeric(x) to convert back to numbers ("10+" become NA), or as.factor(x) to get your result above.

Note that this will modify the original vector itself, so you may want to copy to another vector and work on that.

James
  • 65,548
  • 14
  • 155
  • 193