2

If we want to make a reproducible question on a complex/large dataset for SO, we can use dput(head(df)) to reduce the size.

Is there a similar approach to reduce the size of complex nested lists with varying list lengths? I'm thinking an approach could be to take the first few elements from each list (say first 3) irrespective of individual list type (numeric, character etc.) and nested structure but I'm not sure how to do this.

#sample nested list
L <- list(
  list(1:10),
  list( list(1:10), list(1:10,1:10) ),
  list(list(list(list(1:10))))
)

Running dput(L) will naturally produce the structure for the whole list. Is there a simple way to reduce the overall length of this list (something like dput(head(L))?

I don't want to edit the structure of the list, e.g. I don't want to flatten first or anything - I just want to reduce the size of it and keep all attributes etc.

Thanks

Edit

@thelatemail solution works well:

rapply(L, f = head, n = 3, how = "list")

What if we had a data.frame in the list though, this approach splits the df into separate lists (which I assume is to be expected as list is specified in the rapply call)?. Is there a way to modify this so that it returns head(df) as a data.frame. df included:

L_with_df <- list(
  list(1:10),
  list( list(1:10), list(1:10,1:10), df = data.frame(a = 1:20, b = 21:40) ),
  list(list(list(list(1:10))))
)
rapply(L_with_df, f = head, n = 3, how = "list")

Edit 2

It seems rapply wont work on data.frames, see here.

However, rrapply here, which is an extension of rapply seems to do what I want:

library(rrapply)
rrapply(L_with_df, f = head, n = 3, dfaslist = FALSE)
# [[1]]
# [[1]][[1]]
# [1] 1 2 3


# [[2]]
# [[2]][[1]]
# [[2]][[1]][[1]]
# [1] 1 2 3


# [[2]][[2]]
# [[2]][[2]][[1]]
# [1] 1 2 3

# [[2]][[2]][[2]]
# [1] 1 2 3


# [[2]]$df
#   a  b
# 1 1 21
# 2 2 22
# 3 3 23


# [[3]]
# [[3]][[1]]
# [[3]][[1]][[1]]
# [[3]][[1]][[1]][[1]]
# [[3]][[1]][[1]][[1]][[1]]
# [1] 1 2 3


# Warning message:
# In rrapply(L_with_df, f = head, n = 3, dfaslist = FALSE) :
#   'dfaslist' is deprecated, use classes = 'data.frame' instead

#this produces different output?:
#rrapply(L_with_df, f = head, n = 3, classes = "data.frame")
user63230
  • 4,095
  • 21
  • 43
  • 1
    I don't really want to install all those packages to check with the provided example, but would something like `rapply(mylist, f=head, n=3, how="list")` work for you? – thelatemail Nov 11 '20 at 22:06
  • Also a possible duplicate of https://stackoverflow.com/questions/57404856/how-to-subset-a-vector-inside-list-of-list/57404876 , the only other time I've ever suggested using `rapply` – thelatemail Nov 11 '20 at 22:16

1 Answers1

1

Let's create a nested list to serve as an example.

L <- list(
  list(1:10),
  list( list(1:10), list(1:10,1:10) ),
  list(list(list(list(1:10))))
)

Which has a structure of this:

str(L)
#List of 3
# $ :List of 1
#  ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
# $ :List of 2
#  ..$ :List of 1
#  .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
#  ..$ :List of 2
#  .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
#  .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
# $ :List of 1
#  ..$ :List of 1
#  .. ..$ :List of 1
#  .. .. ..$ :List of 1
#  .. .. .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10

I think using the recursive apply, rapply, with a f=unction of head can handle this without breaking the structure.

rapply(L, f=head, n=3, how="list")

Checks out:

str(rapply(L, f=head, n=3, how="list"))
#List of 3
# $ :List of 1
#  ..$ : int [1:3] 1 2 3
# $ :List of 2
#  ..$ :List of 1
#  .. ..$ : int [1:3] 1 2 3
#  ..$ :List of 2
#  .. ..$ : int [1:3] 1 2 3
#  .. ..$ : int [1:3] 1 2 3
# $ :List of 1
#  ..$ :List of 1
#  .. ..$ :List of 1
#  .. .. ..$ :List of 1
#  .. .. .. ..$ : int [1:3] 1 2 3
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • thanks this is almost what I was after. What if we had a `df` in the list as well, this approach would get `head` of each column rather than just overall `head(df)` (this is intended I assume as you specify `list` in `rapply`?). See update in question. Would you know if there is a way around this? – user63230 Nov 12 '20 at 09:22
  • 1
    @user63230 - After a few futile attempts, I can't seem to get `rapply` to handle `data.frames` nicely. I'll keep thinking but another approach may be needed altogether to be more flexible. – thelatemail Nov 12 '20 at 22:06
  • 1
    @Matthew Plourde suggests [here](https://stackoverflow.com/questions/16596198/rapply-over-a-nested-list-in-r) `rapply` will not work on lists that include `data.frame` but I then found this [question](https://stackoverflow.com/questions/17971073/rapply-to-nested-list-of-data-frames-in-r) which suggests package `rrapply` which is an extension of `rapply`, can do this. It seems to do what I want exactly, see update. I'm trying to think of obscure items in a list that this wouldn't return but I can't think of any, can you? – user63230 Nov 13 '20 at 12:38