2

I'm having trouble to rapply over a nested list. Here's the structure of a sample of one element of the list :

$ F01    :List of 7
  ..$ 0:'data.frame':   16 obs. of  3 variables:
  .. ..$ lengths: Factor w/ 8 levels "1","2","4","5",..: 1 2 3 4 5 6 7 8 1 2 ...
  .. ..$ values : Factor w/ 2 levels "C","N": 1 1 1 1 1 1 1 1 2 2 ...
  .. ..$ Freq   : int [1:16] 1 2 0 1 1 1 1 0 1 3 ...
  ..$ 1:'data.frame':   20 obs. of  3 variables:
  .. ..$ lengths: Factor w/ 10 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
  .. ..$ values : Factor w/ 2 levels "C","N": 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ Freq   : int [1:20] 0 1 1 1 1 0 1 0 1 1 ...

I can easily apply a function to one element of the list with lapply : say F01

 lapply(data$F01,function(x) x[which(x[['values']]=="C"),])

Then I thought of applying it to the whole nested list with rapply :

rapply(data,function(x) x[which(x[['values']]=="C"),],how="list")
Error in `[[.default`(x, "values") : subscript out of bounds

I don't get why I get this rapply error, as rapply should lapply recursively to non list elements, in this case a data.frame. Is there something obvious that I don't get ?

here's a sample of two complete element of the main list :

samp <- list(structure(list(`0` = structure(list(lengths = structure(c(1L, 
    2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("1", 
    "2", "7", "8", "13", "18"), class = "factor"), values = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("C", 
    "N"), class = "factor"), Freq = c(0L, 1L, 1L, 1L, 1L, 0L, 2L, 
    0L, 0L, 0L, 0L, 1L)), .Names = c("lengths", "values", "Freq"), row.names = c(NA, 
    -12L), class = "data.frame"), `1` = structure(list(lengths = structure(c(1L, 
    2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("1", 
    "2", "3", "5", "8", "12"), class = "factor"), values = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("C", 
    "N"), class = "factor"), Freq = c(1L, 1L, 0L, 1L, 1L, 1L, 2L, 
    0L, 1L, 1L, 0L, 0L)), .Names = c("lengths", "values", "Freq"), row.names = c(NA, 
    -12L), class = "data.frame"), `2` = structure(list(lengths = structure(c(1L, 
    2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("1", 
    "3", "4", "6", "9", "19", "20"), class = "factor"), values = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("C", 
    "N"), class = "factor"), Freq = c(1L, 1L, 1L, 1L, 0L, 1L, 0L, 
    0L, 0L, 3L, 0L, 1L, 0L, 2L)), .Names = c("lengths", "values", 
    "Freq"), row.names = c(NA, -14L), class = "data.frame"), `3` = structure(list(
        lengths = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 
        2L, 3L, 4L, 5L, 6L, 7L, 8L), .Label = c("1", "2", "3", "4", 
        "5", "8", "11", "18"), class = "factor"), values = structure(c(1L, 
        1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
        ), .Label = c("C", "N"), class = "factor"), Freq = c(1L, 
        2L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 2L, 1L, 1L, 1L, 0L, 0L, 1L
        )), .Names = c("lengths", "values", "Freq"), row.names = c(NA, 
    -16L), class = "data.frame"), `4` = structure(list(lengths = structure(c(1L, 
    2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("1", 
    "2", "3", "4", "6", "11", "13"), class = "factor"), values = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("C", 
    "N"), class = "factor"), Freq = c(0L, 2L, 0L, 1L, 1L, 0L, 2L, 
    1L, 2L, 2L, 0L, 0L, 1L, 0L)), .Names = c("lengths", "values", 
    "Freq"), row.names = c(NA, -14L), class = "data.frame"), `5` = structure(list(
        lengths = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
        1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), .Label = c("1", "2", 
        "4", "5", "6", "7", "8", "11", "23"), class = "factor"), 
        values = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
        2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("C", "N"), class = "factor"), 
        Freq = c(0L, 3L, 1L, 2L, 0L, 1L, 0L, 0L, 1L, 3L, 2L, 0L, 
        0L, 1L, 0L, 1L, 1L, 0L)), .Names = c("lengths", "values", 
    "Freq"), row.names = c(NA, -18L), class = "data.frame"), `6` = structure(list(
        lengths = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
        10L, 11L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("1", 
        "2", "3", "4", "5", "6", "9", "12", "13", "21", "36"), class = "factor"), 
        values = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
        1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("C", 
        "N"), class = "factor"), Freq = c(2L, 2L, 3L, 1L, 2L, 1L, 
        2L, 1L, 0L, 0L, 0L, 2L, 3L, 1L, 4L, 0L, 1L, 0L, 0L, 1L, 1L, 
        1L)), .Names = c("lengths", "values", "Freq"), row.names = c(NA, 
    -22L), class = "data.frame")), .Names = c("0", "1", "2", "3", 
    "4", "5", "6")), structure(list(`0` = structure(list(lengths = structure(c(1L, 
    2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("2", "13", "17", "25"
    ), class = "factor"), values = structure(c(1L, 1L, 1L, 1L, 2L, 
    2L, 2L, 2L), .Label = c("C", "N"), class = "factor"), Freq = c(1L, 
    1L, 0L, 1L, 0L, 0L, 1L, 1L)), .Names = c("lengths", "values", 
    "Freq"), row.names = c(NA, -8L), class = "data.frame"), `1` = structure(list(
        lengths = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 
        4L, 5L, 6L), .Label = c("1", "2", "3", "4", "5", "8"), class = "factor"), 
        values = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
        2L, 2L, 2L), .Label = c("C", "N"), class = "factor"), Freq = c(0L, 
        0L, 1L, 2L, 2L, 0L, 1L, 1L, 0L, 1L, 1L, 1L)), .Names = c("lengths", 
    "values", "Freq"), row.names = c(NA, -12L), class = "data.frame"), 
        `2` = structure(list(lengths = structure(c(1L, 2L, 3L, 4L, 
        5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("2", 
        "3", "4", "7", "14", "18", "19"), class = "factor"), values = structure(c(1L, 
        1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("C", 
        "N"), class = "factor"), Freq = c(1L, 1L, 2L, 0L, 0L, 0L, 
        0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L)), .Names = c("lengths", "values", 
        "Freq"), row.names = c(NA, -14L), class = "data.frame"), 
        `3` = structure(list(lengths = structure(c(1L, 2L, 3L, 4L, 
        5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), .Label = c("2", 
        "3", "5", "8", "9", "10", "19", "76"), class = "factor"), 
            values = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
            2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("C", "N"), class = "factor"), 
            Freq = c(1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
            1L, 1L, 0L, 1L, 1L)), .Names = c("lengths", "values", 
        "Freq"), row.names = c(NA, -16L), class = "data.frame"), 
        `4` = structure(list(lengths = structure(c(1L, 2L, 3L, 4L, 
        5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("2", 
        "5", "7", "8", "9", "16", "35"), class = "factor"), values = structure(c(1L, 
        1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("C", 
        "N"), class = "factor"), Freq = c(1L, 1L, 2L, 0L, 1L, 0L, 
        0L, 1L, 0L, 0L, 2L, 0L, 1L, 1L)), .Names = c("lengths", "values", 
        "Freq"), row.names = c(NA, -14L), class = "data.frame"), 
        `5` = structure(list(lengths = structure(c(1L, 2L, 3L, 4L, 
        5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), .Label = c("1", 
        "2", "3", "5", "6", "10", "11", "14", "27"), class = "factor"), 
            values = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
            1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("C", 
            "N"), class = "factor"), Freq = c(2L, 2L, 1L, 1L, 1L, 
            1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L)), .Names = c("lengths", 
        "values", "Freq"), row.names = c(NA, -18L), class = "data.frame"), 
        `6` = structure(list(lengths = structure(c(1L, 2L, 3L, 4L, 
        5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), .Label = c("1", 
        "2", "3", "4", "5", "6", "11", "21", "51"), class = "factor"), 
            values = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
            1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("C", 
            "N"), class = "factor"), Freq = c(2L, 1L, 2L, 2L, 1L, 
            1L, 0L, 0L, 0L, 3L, 0L, 2L, 0L, 1L, 1L, 1L, 1L, 1L)), .Names = c("lengths", 
        "values", "Freq"), row.names = c(NA, -18L), class = "data.frame")), .Names = c("0", 
    "1", "2", "3", "4", "5", "6")))
Chargaff
  • 1,562
  • 2
  • 19
  • 41
  • 1
    Why is `samp` not a nested list if that's what your data is? You're not going to be able to use `rapply` when your `list` elements are `data.frame`s, because `data.frame`s are `list`s, and so `rapply` will traverse the columns. That's why you're getting the error. It's trying to get the 'values' item from each column in your `data.frame`s. – Matthew Plourde May 16 '13 at 19:58
  • use `lapply` in your expression instead of `rapply` – eddi May 16 '13 at 20:16
  • @eddi, I think Chargaff simply posted wrong sample data. If you look at the `str` at the top of the OP, it is clearly nested – Ricardo Saporta May 16 '13 at 20:52
  • @MatthewPlourde, samp is only the first element of the nested list, so it isn't nested. I could remove it if it's confusing. – Chargaff May 16 '13 at 21:24
  • @Chargaff, it is not **just** that it is confusing, it's that such a piece of information is crucial. The entire issue here is the depth of the list. Posting a child and not explaining that it is a child doesnt really help anyone to in assiting – Ricardo Saporta May 16 '13 at 21:47

1 Answers1

4

I don't believe you actually want to use rapply here, as you do not seem to want total recursion. That is, you are not trying to apply a function to lengths and then to values, etc.

Instead, try simply two nested lapply 's:

 lapply(dat, lapply, function(x) x[which(x[['values']]=="C"),])
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
  • Thanks for your answer. I get the same error as with rapply. You are right, I'm not trying to apply the function to lengths AND values, but isn't rapply required to recursively apply over a nested list ? – Chargaff May 16 '13 at 21:26
  • 1
    @Chargaff, the error is simply telling you that there is no element named `values` for that `x`. In other words, you're at the wrong level. The above code works for `dat <- list(samp, samp)`. I would suggest editing the OP and instead of `samp` using your actual `dat` (or `dput(dat[1:2])`) – Ricardo Saporta May 16 '13 at 21:41
  • I edited the question. Yes, your code works as expected on samp datas, I just don't know why it doesn't get through my actual data set, as the structure is totally similar... I'll check into that. – Chargaff May 16 '13 at 21:56
  • @Chargaff, I copy + pasted the new data that you edited in. I then copied and pasted the code I have here. I do not get any error. – Ricardo Saporta May 16 '13 at 22:00