0

I have a data frame named cst with columns country, ID, and age. I want to make bins for age (divide all ID's into deciles or quartiles) for each separate country. I used this way:

cut(cst[!is.na(cst$age), "age"], quantile(cst["age"], probs = seq(0,1,0.1), na.rm = T))

However, it makes bins for all data frame, but I need for each country separately.
Could you help me?

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66

2 Answers2

2

I'd try with a dplyr solution, this would look someithing like this:

library(dplyr)
cst2 <- cst %>%
  group_by(country) %>%
  mutate(
    bin = cut(age, quantile(age, probs=seq(0,1,0.1), na.rm=TRUE))
  ) %>%
  ungroup()
snaut
  • 2,261
  • 18
  • 37
  • While creating bins, I wanted to chose specific break points and let the labels (a simple sequence) adjust based on the break points selected but I get an error that the length of break and labels is different. #series z <- rnorm(100) # break points brk <- seq(min(z),max(z), by = 0.5) #break label lbl <- seq(1,length(brk), by = 1) cut(z, breaks = brk, labels = lbl) if you check length(lbl) == length(brk) you get TRUE Any idea why do I get this error Error in cut.default(z, breaks = brk, labels = lbl) : lengths of 'breaks' and 'labels' differ – seakyourpeak Apr 15 '20 at 22:25
  • 1
    This seems to be a mistake in the documentation, breaks needs to be one longer than labels. – snaut Apr 17 '20 at 07:23
0

All you need to do is to apply a subset before using the cut. It also does not employ the dplyr library.

for (c in unique(as.list(cst$country))) {
  sub <- subset(cst, country == c)
  cut(sub[!is.na(sub$age), "age"], quantile(sub["age"], probs = seq(0,1,0.1), na.rm = T))
}
Iago Carvalho
  • 410
  • 1
  • 5
  • 15