1

My question's title almost matches the dlply (plyr package) description, except for the "nested" part.

Let me explain with an example:

library(plyr)
res <- dlply(mtcars, c("gear", "carb"), identity)
head(res, 2)
# $`3.1`
#                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
# Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
# Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
# 
# $`3.2`
#                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# Dodge Challenger  15.5   8  318 150 2.76 3.520 16.87  0  0    3    2
# AMC Javelin       15.2   8  304 150 3.15 3.435 17.30  0  0    3    2
# Pontiac Firebird  19.2   8  400 175 3.08 3.845 17.05  0  0    3    2

As you can see, the output is a list where the names (keys) are the concatenation of the two variables I used for splitting the data, e.g. "3.1" is the key for (gear = 3, carb = 1).

Instead, I would like my result to be a nested list so the elements can be accessed through two sets of keys, one for each of my splitting variables: res[["3"][["1"]].

Is there something around, not necessarily from the plyr package, that can achieve this? I'd like the answer to be generalizable to any number of splitting variables. Also, it is important that I can apply any function although my example used the identity function, resulting in a mere split of the data. Thank you for your suggestions.

flodel
  • 87,577
  • 21
  • 185
  • 223

2 Answers2

3

I came with a solution myself, it uses recursion:

nested.dlply <- function(df, by, fun, ...) {

   require(plyr)

   if (length(by) == 1) {
      dlply(df, by, fun, ...)
   } else {
      dlply(df, by[1], nested.dlply, by[-1], fun, ...)
   }
}

Here are a couple examples:

nested.dlply(mtcars, c("gear", "carb"), identity)
# $`3`
# $`3`$`1`
#                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
# Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
# Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
# 
# $`3`$`2`
#                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# Dodge Challenger  15.5   8  318 150 2.76 3.520 16.87  0  0    3    2
# AMC Javelin       15.2   8  304 150 3.15 3.435 17.30  0  0    3    2
# Pontiac Firebird  19.2   8  400 175 3.08 3.845 17.05  0  0    3    2
# [...]

nested.dlply(mtcars, c("gear", "carb"), head, 2)
# $`3`
# $`3`$`1`
#                 mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# Valiant        18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
# 
# $`3`$`2`
#                    mpg cyl disp  hp drat   wt  qsec vs am gear carb
# Hornet Sportabout 18.7   8  360 175 3.15 3.44 17.02  0  0    3    2
# Dodge Challenger  15.5   8  318 150 2.76 3.52 16.87  0  0    3    2
# [...]

I doubt this is very efficient but it does the job. I still welcome your suggestions. Ideally I was hoping some package already implemented it.

flodel
  • 87,577
  • 21
  • 185
  • 223
  • 2
    I'm not sure if you saw [this post](http://stackoverflow.com/q/7247108/1270695), but Brian Diggs provided the `plyr` solution there: `dlply(mtcars, .(gear), dlply, .(carb))` and so on, for more nesting. As discussed at that question, such a nested data structure might not be the most convenient to work with. – A5C1D2H2I1M1N2O1R2T1 Jul 18 '12 at 16:55
  • Thank you @mrdwab. I think that the only way to generalize the nested `dlply` calls suggested by Brian Diggs is to use a recursion like I have above. The link you provided did help me make my code a bit shorter (edited). – flodel Jul 19 '12 at 01:31
2

What about nesting split?

temp = lapply(split(mtcars, mtcars$gear), function(x) split(x, x$carb))
temp[["3"]]["1"]
# $`1`
#                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
# Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
# Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • thanks but it does not exactly address my problem, read the end of my question: I need it to be generalized to any number of split variables (not just two), and it needs to apply a function (not just do a split). – flodel Jul 18 '12 at 12:32