0

I have two disk frame and each are about 20GB worth of files.

It's too big to merge as data tables because the process requires more than the memory I have available. I tried using this code: output <- rbindlist(list(df1, df2))

The wrinkle is that I'd like to also run unique since there might be dups in my data.

Can I use the same code with rbindlist on two disk frames?

Cauder
  • 2,157
  • 4
  • 30
  • 69

1 Answers1

2

Yeah. You just do rbindlist.disk.frame(list(df1, df2))

I need to implement bind_rows at some point too!

xiaodai
  • 14,889
  • 18
  • 76
  • 140
  • Thanks! I'm running rbindlist.disk.frame on two 30M rows x 8 columns disk frames. It's been running for about two hours and the progress indicator hasn't moved. Is that normal? – Cauder Sep 19 '20 at 01:49
  • Eh... Not normally. 30m rows? And mmhave big columns. This should be sub minute. But of course depends on your column size – xiaodai Sep 19 '20 at 01:51
  • Weird. The machine is still responsive, like I can press stop and it stops immediately. I wonder if the machine is just busy right now. I'll try again tomorrow. – Cauder Sep 19 '20 at 01:52
  • It's working! Strangely, the progress bar isn't showing up – Cauder Sep 20 '20 at 15:35
  • 1
    Perhaps it's a bug. Let me look into it. I am currently reworking the whole NSE system so bugs are likely. Also it's pre-v1 so thanks for testing. :) – xiaodai Sep 21 '20 at 04:14