1

I have a several groups, let's say A,B,C and I want to cut another variable based on these groups, i.e. each group has specific breaks for the same variable.

If I had to calculate the groups mean, i´d use tapply like this:

tapply(mydata$var,mydata$group,mean)

Unfortunately I do not know how to fix this for cut with changing breaks=c(...) arguments for different groups.

tapply(mydata$var,mydata$group,cut)

Any suggestions? I´d like to do it with tapply but any other solution but a custom made function would be suitable too.

EDIT: some small example:

test <- data.frame(var=rnorm(100,0,1),
               group=c(rep("A",30),
                       rep("B",20),
                       rep("C",50)))
# for group A:
cut(test$var,breaks=c(-4,0,4))
# for group B
cut(test$var,breaks=c(-4,1,4))

and so on...

Matt Bannert
  • 27,631
  • 38
  • 141
  • 207
  • Can you construct a small example? Right now it's unclear how you'd like `group` to determine/direct the selection of `cut()` breakpoints. – Josh O'Brien Dec 23 '11 at 15:22

2 Answers2

2

I'm going to put my mind-reading hat on here and take a stab that you want something like this:

dat <- data.frame(x = runif(100),grp = rep(letters[1:3],length.out = 100))

mapply(cut,split(dat$x,dat$grp),list(c(-Inf,0.5,Inf),
                                     c(-Inf,0.1,0.5,0.9,Inf),
                                     c(-Inf,0.25,0.5,0.75,Inf)))

So this is simply splitting x by grp and applying cut to each piece using different breaks for each piece.

joran
  • 169,992
  • 32
  • 429
  • 468
  • There's a problem with this solution: I need to unlist the result because I want to add the received factor to the original data.frame again. By unlisting I just mix the sorting up. – Matt Bannert Dec 23 '11 at 15:37
  • 2
    @ran2 Then I'm genuinely confused; cutting each piece using different breaks will create factors with different levels. If you're going to recombine them into one factor, you can only have one set of levels. – joran Dec 23 '11 at 15:41
  • I see. Honestly I did not consider that. Probably I cannot use factors then. I use these breaks as size classes and these classes are depending on the group the data is an. My data.frame contains several variables which are aggregated in some specific procedures combining its variables. I need to perform the aggregation separately for every size class. So it does not help much if only the group and x variable is splited by split.. Maybe my whole approach is just not so good. Any better ideas? – Matt Bannert Dec 23 '11 at 15:50
  • +1 for mind-reading faster than I could provide my late example. At the first glimpse it looks like I might add the acc soon – Matt Bannert Dec 23 '11 at 16:44
1

Actually R behaves quite clever here. I found a solution that does work the way I thought initially. Though it's not using the apply family. Somehow R creates integers here instead of factors – which is why in this solution, there is no problem with factor levels like Joran mentions.

dat <- data.frame(x = rnorm(100),grp = rep(letters[1:3],length.out = 100))
ifelse(dat$grp == "a",cut(dat$x,breaks=c(-Inf,0.1,0.2,Inf)),
       ifelse(dat$grp == "b",cut(dat$x,breaks=c(-Inf,0.1,1,Inf)),
              cut(dat$x,breaks=c(-Inf,0.9,2,Inf))) )
Matt Bannert
  • 27,631
  • 38
  • 141
  • 207
  • Ok, I see. You could probably achieve the same result on the output of `mapply` by coercing each piece with `as.integer` and then `unlist`ing. Glad you figured it out, though. – joran Dec 23 '11 at 16:34
  • Without your comment I would have probably tried forever and a day. Simply did not think about being limited to one set of factor levels – which is sooo obvious if you are aware. Thx! – Matt Bannert Dec 23 '11 at 16:42