3

I have 3 very large files with thousands of observations (file_1 = 6314 rows, file_2 = 11020 rows, file_3 = 2757 rows). I need to join them, so I used the function full_join() from the dplyr package. When I run the code I get this error:Error: std::bad_alloc and nothing else.

How do I fix this?

This is my code:

det = full_join(det1, det2, by = "collectioncode")
det = full_join(det, det3, by = "collectioncode")
Kristen Cyr
  • 629
  • 5
  • 16

1 Answers1

0

I may be late to the party, but I was running into similar problems with joins that should not have been causing these problems (hence I found this thread). Then I noticed that, by default, dplyr's joins match NAs, which blew me dataset into the millions. I could fix my issue by setting na_matches to "never", e.g.:

det = full_join(det1, det2, by = "collectioncode", na_matches="never")

Hope this helps!

Tim K
  • 23
  • 4