Using the tidyverse, I'm looking to discretize numerical data with the goal of using a bar chart to plot the different numerical ranges as if the data were categorical, by manually declaring where the cuts occur, such as with age groups or income ranges. I wish to have intervals of unequal width.
So far, I've tried the base R approach, using cut()
and setting the bins with breaks = c()
. I notice, however, that there exist a set of functions cut_interval
, cut_width
, and cut_number
in the ggplot2
package. I figure that there's a way to manually set the interval cuts using these functions, because the breaks
argument exists for the interval and number variant.
library(tidyverse)
mtcars <- as_tibble(mtcars)
mtcars %>%
count(cut_interval(mpg, n = 4))
#> # A tibble: 4 x 2
#> `cut_interval(mpg, n = 4)` n
#> <fct> <int>
#> 1 [10.4,16.3] 10
#> 2 (16.3,22.1] 13
#> 3 (22.1,28] 5
#> 4 (28,33.9] 4
mtcars %>%
count(cut_interval(mpg, n = 4, breaks = c(10, 18, 23, 28, 35)))
#> Error: Evaluation error: lengths of 'breaks' and 'labels' differ.
Created on 2019-06-03 by the reprex package (v0.2.1)
The above is close to what I want, but it sets the breaks based on the number of intervals.
In the above example, I would like my groups to be precisely as follows:
10-18, 19-23, 24-28, 29-35.
Is this possible using the breaks
argument? Thank you.