How do I bind two disk frames together?

Question

I have two disk frame and each are about 20GB worth of files.

It's too big to merge as data tables because the process requires more than the memory I have available. I tried using this code: output <- rbindlist(list(df1, df2))

The wrinkle is that I'd like to also run unique since there might be dups in my data.

Can I use the same code with rbindlist on two disk frames?

score 2 · Accepted Answer · answered Sep 17 '20 at 02:30

2

Yeah. You just do rbindlist.disk.frame(list(df1, df2))

I need to implement bind_rows at some point too!

answered Sep 17 '20 at 02:30

xiaodai

14,889
18
76
140

Thanks! I'm running rbindlist.disk.frame on two 30M rows x 8 columns disk frames. It's been running for about two hours and the progress indicator hasn't moved. Is that normal? – Cauder Sep 19 '20 at 01:49
Eh... Not normally. 30m rows? And mmhave big columns. This should be sub minute. But of course depends on your column size – xiaodai Sep 19 '20 at 01:51
Weird. The machine is still responsive, like I can press stop and it stops immediately. I wonder if the machine is just busy right now. I'll try again tomorrow. – Cauder Sep 19 '20 at 01:52
It's working! Strangely, the progress bar isn't showing up – Cauder Sep 20 '20 at 15:35
1

Perhaps it's a bug. Let me look into it. I am currently reworking the whole NSE system so bugs are likely. Also it's pre-v1 so thanks for testing. :) – xiaodai Sep 21 '20 at 04:14

How do I bind two disk frames together?

1 Answers1