0

I have list of 100,000 dataframes and want to full join them into one dataframe. I tried Reduce but it is very slow. This is example of my code:

dfs <- list(
    df1 = data.frame(a = 1:3, b = c("a", "b", "c")),
    df2 = data.frame(c = 4:6, b = c("a", "c", "d")),
    df3 = data.frame(d = 7:9, b = c("b", "c", "e"))
)
z <- dfs %>%
    Reduce(function(dtf1,dtf2) dplyr::full_join(dtf1,dtf2), .)

Can someone give any suggestion to optimize this full_join function?

Community
  • 1
  • 1
  • Did you meant `Reduce` instead of `Replace` – akrun May 09 '17 at 06:06
  • Possible duplicate of [How to join multiple data frames using dplyr?](http://stackoverflow.com/questions/34344214/how-to-join-multiple-data-frames-using-dplyr) – Adam Quek May 09 '17 at 06:08
  • 2
    Full joins on that quantity of data.frames has the potential to get very, very large, depending on the data structure. – alistaire May 09 '17 at 06:12
  • I used the code as an example for the optimisation question because I have to deal with a large list of dataframe. I also advice [How to join multiple data frames using dplyr?](http://stackoverflow.com/questions/34344214/how-to-join-multiple-data-frames-using-dplyr) but it works very slow with the large list – Tan Trinh Cong May 09 '17 at 07:31
  • What does *very slow* mean? 5 mins? 50 mins? 5 hrs? Does your machine have sufficient RAM? – Parfait May 09 '17 at 19:55
  • I run for a list of 100,000 data.frames and it took more than one day. As I understand the function `z <- dfs %>% Reduce(function(dtf1,dtf2) dplyr::full_join(dtf1,dtf2), .)` will take one pair of data.frame and join from left to right. So it will take a a lot of time when the list increase. – Tan Trinh Cong May 10 '17 at 04:53
  • By the way, I found the solution by using `rbind.fill(dfs)` . It will only combine by rows multiple data.frames and fill NA to unmatched rows. **Thank for all for your comments** – Tan Trinh Cong May 10 '17 at 05:22

0 Answers0