8

Assume we have a nested list:

test <- list(
  list(a = 1, b = 2, c = NULL),
  list(a = NULL, b = 2, c = 3))

How do I replace all NULL values with, say, NA to preserve the structure of data? So that I don't end up losing values/structure when I try to make data frame out of the list. Such as:

data.frame(matrix(unlist(test), nrow = 2, byrow = T))

  X1 X2
1  1  2
2  2  3

Desired output is something like:

   X1 X2 X3
1  1  2  NA
2  NA 2  3

There are suggestions to do it this way:

rbind.fill(lapply(test, function(f) {
  as.data.frame(Filter(Negate(is.null), f))
}))

Which is not quite as vectorized as I'd like. Obviously size and performance is an issue. One workaround that pops in mind is replacing all NULL values similarly as it can be done for the whole data frame at once. And then unlist() and matrix() the list.

I'm not sure about the gain in the performance (if there is any at all). Perhaps good old lapply() isn't all that bad.

statespace
  • 1,644
  • 17
  • 25
  • To replace the NULL values within the whole list, we have to access the elements inside. Using `data.table 1.9.5` you can use `rbindlist(lapply(....), fill=TRUE)` in your updated code. – akrun Apr 03 '15 at 07:46
  • ..which calls for lapply and basically ends up with same performance as your original answer? – statespace Apr 03 '15 at 07:49
  • I haven't checked the benchmarks, but I think rbindlist would be faster. – akrun Apr 03 '15 at 07:50

4 Answers4

4

We can use stri_list2matrix

library(stringi)
m1 <- matrix(as.numeric(t(sapply(test, stri_list2matrix))), ncol=3)
m1
#    [,1] [,2] [,3]
#[1,]    1    2   NA
#[2,]   NA    2    3

This could be converted to data.frame

as.data.frame(m1)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Just curious - is there an option to replace values, instead of applying loop-like functions? There are other options like `rbind.all` in `data.table` package too... It does achieve the result but the route is not quite the one I expected. – statespace Apr 03 '15 at 07:29
  • After some testing on different cases, this seems to be most universal approach. – statespace Apr 03 '15 at 08:37
  • @A.Val. Have you tried some benchmarks using this method and the `rbindlist` – akrun Apr 03 '15 at 08:41
  • Would this still work if my nested list has mixed vector types? (I have doubles, characters, and lists inside my list...) – Maggie Dec 17 '17 at 05:45
  • @MagO Without knowing the structure, it is difficult to say. Could you post as a new question – akrun Dec 17 '17 at 06:38
2

Following answers found here, here is a (bit twisted) way to go in base R:

data.frame(matrix(as.numeric(as.character(unlist(test, recursive=F))), nrow=length(test), byrow=T))

#  X1 X2 X3
#1  1  2 NA
#2 NA  2  3
Community
  • 1
  • 1
Cath
  • 23,906
  • 5
  • 52
  • 86
  • @ColonelBeauvel, thanks, altough the only credit I have here is to have found the right SO page ;-) – Cath Apr 03 '15 at 10:11
1

If you know that it's just a two level list you can cycle through each element using two loops testing for is.null.

test <- list(
  list(a = 1, b = 2, c = NULL),
  list(a = NULL, b = 2, c = 3)
)

# get structure fyi
str(test)
# tells you the highest level list length
length(test)

for(i in 1:length(test))
    for(j in 1:length(test[[i]])) # second level
        if(is.null(test[[i]][[j]])) test[[i]][[j]]<-NA

EDIT: then of course do what you were doing before.

tystanza
  • 161
  • 1
  • 5
  • Loop is an obvious answer that I want to avoid at all costs, probably I should expand my question a bit to clarify. – statespace Apr 03 '15 at 07:34
0

A different approach via dput and dget and the clipboard.

test <- list(
  list(a = 1, b = 2, c = NULL),
  list(a = NULL, b = 2, c = 3))

dput(test,file="clipboard-1024")
tmp=dget(textConnection(gsub("NULL","NA",readChar(con="clipboard-1024",file.info("clipboard-1024")$size))))
data.frame(matrix(unlist(tmp), nrow = 2, byrow = T))

#  X1 X2 X3
#1  1  2 NA
#2 NA  2  3

I have no idea how fast or slow this is.

cryo111
  • 4,444
  • 1
  • 15
  • 37