3

Using the tidyverse, I'm looking to discretize numerical data with the goal of using a bar chart to plot the different numerical ranges as if the data were categorical, by manually declaring where the cuts occur, such as with age groups or income ranges. I wish to have intervals of unequal width.

So far, I've tried the base R approach, using cut() and setting the bins with breaks = c(). I notice, however, that there exist a set of functions cut_interval, cut_width, and cut_number in the ggplot2 package. I figure that there's a way to manually set the interval cuts using these functions, because the breaks argument exists for the interval and number variant.

library(tidyverse)

mtcars <- as_tibble(mtcars)

mtcars %>% 
  count(cut_interval(mpg, n = 4))
#> # A tibble: 4 x 2
#>   `cut_interval(mpg, n = 4)`     n
#>   <fct>                      <int>
#> 1 [10.4,16.3]                   10
#> 2 (16.3,22.1]                   13
#> 3 (22.1,28]                      5
#> 4 (28,33.9]                      4

mtcars %>% 
  count(cut_interval(mpg, n = 4, breaks = c(10, 18, 23, 28, 35)))
#> Error: Evaluation error: lengths of 'breaks' and 'labels' differ.

Created on 2019-06-03 by the reprex package (v0.2.1)

The above is close to what I want, but it sets the breaks based on the number of intervals.

In the above example, I would like my groups to be precisely as follows:

10-18, 19-23, 24-28, 29-35.

Is this possible using the breaks argument? Thank you.

Chris A.
  • 369
  • 2
  • 14

1 Answers1

6

You can just use the actual base cut function to do this:

library(tidyverse)

mtcars %>% 
    mutate(bin = cut(mpg, breaks = c(Inf, 10, 18, 19, 23, 24, 28, 29,35))) %>% 
    count(bin)

Which will give you:

# A tibble: 5 x 2
  bin         n
  <fct>   <int>
1 (10,18]    13
2 (18,19]     2
3 (19,23]    10
4 (24,28]     3
5 (29,35]     4
divibisan
  • 11,659
  • 11
  • 40
  • 58
MDEWITT
  • 2,338
  • 2
  • 12
  • 23