6

I am trying to rbind a large list of data frames (outputDfList), which is generated by lapply a complicated function to a large table. You can recreate outputDfList by:

df1=data.frame("randomseq_chr15q22.1_translocationOrInsertion", "chr15", "63126742")
names(df1)=NULL
df2=df1=data.frame("chr18q12.1_chr18q21.33_large_insertion", "chr18 ", "63126741")
names(df2)=NULL
outputDfList=list(df1,df2)

my code is

do.call(rbind, outputDfList)

The error message I received:

Error in pi[[j]] : subscript out of bounds

I double checked the column numbers of each dataframes and they are all the same. I also tried to use "options(error=recover)" for debug, but I'm not familiar with it enough to pitch down the exact issue. Any help is appreciated. Thank you.

Helene
  • 953
  • 3
  • 12
  • 22
  • I’m unable to reproduce the error message. You’ll need to construct a minimal example to reproduce the problem, and post the exact code/data to reproduce it here. [reprex may be helpful for that.](http://jennybc.github.io/reprex/) – Konrad Rudolph Jan 16 '17 at 17:39
  • @KonradRudolph Thanks a lot for the comment. You are right. I added back the long names of my dataframes and I think now it should show the error. – Helene Jan 16 '17 at 18:42
  • Unfortunately this isn’t sufficient since we still don’t know exactly what your data looks like (if I try reconstructing your data from what you’ve posted, the command works). Could you please `dput` the relevant data? – Konrad Rudolph Jan 16 '17 at 19:22
  • @KonradRudolph Thank you for being so patient. I could not dput the original data because the outputDfList is generated by lapply a complicated function to a table. However, I was able to reproduce the error using the code above. Would you please try the code and let me know if you could see the error please? Thanks a lot. – Helene Jan 16 '17 at 19:28
  • Why are you setting the column names to NULL? rbind is trying to match up columns by name - difficult if there aren't any – Richard Telford Jan 17 '17 at 21:20
  • @RichardTelford You are right. I didn't realize that. I set it to NULL to mimic my original code. The dataframes were generated with different colnames by default, so I had to reset them. Now it is fixed thank you. – Helene Jan 17 '17 at 22:42

1 Answers1

7

After the update it seems that your problem is that you have invalid column names: Data frame column names must be non-null.

After correcting this, the code then works:

for (i in seq_along(outputDfList)) {
    colnames(outputDfList[[i]]) = paste0('V', seq_len(ncol(outputDfList[[i]])))
}

do.call(rbind, outputDfList)
#                                       V1     V2       V3
# 1 chr18q12.1_chr18q21.33_large_insertion chr18  63126741
# 2 chr18q12.1_chr18q21.33_large_insertion chr18  63126741

However, I’m puzzled how this situation occurred in the first place. Furthermore, the error message I’m getting with your code is still distinct from yours:

Error in match.names(clabs, names(xi)) :
names do not match previous names

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Thanks for the reply. I am puzzled by it as well... but you are absolutely right about I need column names for my data frames. I added this to the function which generated the list of dataframes, and it worked. Thank you! – Helene Jan 16 '17 at 20:19
  • 1
    I've seen both errors now. I was trying to call `do.call(rbind, myList)` on a list of data frames when I got the match.names error. The data frames all had different column names so I used `lapply(myList, unname)` thinking this would fix the problem but then when I tried `do.call()` again, I got the subscript out of bounds error described above. As described in the comments above, this has the effect of setting the column names to NULL so `rbind()` fails. – syntonicC Apr 11 '18 at 21:05