With the Jul2021 release, MonetDB's transaction layer got a complete overhaul. Generally the new algorithm has better and more stable performance but a side effect of the change is that COPY INTO currently needs more scratch space while loading.
In a trial I ran loading a 65GB csv file, on Oct2020 the database directory grew to 38G and remained there. On Jul2021, the size went up and down to a maximum of 58G and eventually settled on 41G. The difference between 38G and 41G is probably due to scratch space that will eventually be released.
Based on these numbers it sounds entirely plausible that 60G of free space is not sufficient for loading a 70G file. Possible solutions are,
- find more disk space, or
- load the data in smaller batches.
If you're on a Unix-like system such as Linux or macOS, the split utility may come in handy for option 2.
UPDATED ANSWER
Indeed, 650G of free space ought to be enough.
This may be a bug to be reported at the bugtracker at
https://github.com/MonetDB/MonetDB/issues.
Fortunately, your problem is reproducible.
However, 70G of data is a bit much for a reproducible testcase.
Could you try if the problem also occurs if you have 225M instances
of the same line instead of 225M different lines? Then you can provide in the bug report that single line, plus the
CREATE TABLE statement for the table, plus you exact COPY INTO command.
It might also be useful to enable some debug tracing. You can
do it like this:
DROP TABLE IF EXISTS foo;
CREATE TABLE foo(i INT, j INT);
CALL logging.setcomplevel('HEAP', 'DEBUG');
CALL logging.setflushlevel('DEBUG');
COPY INTO foo FROM '/tmp/jvr/stackoverflow.csv';
SELECT COUNT(*) FROM foo;
The traces end up in the file mdbtrace.log in the database directory.
Maybe this will shed some light on that's happening.
Finally, I'm curious whether the problem also occurs on newer MonetDB releases. Could you try the images docker/dev-builds:Jul2021
and docker/dev-builds:Jan2022
? They are currently really rough around the edges, you have to run bash inside the container and start MonetDB manually. Also, you cannot access it from outside:
ยป docker run -p 127.0.0.1:50000:50000 -ti monetdb/dev-builds:Jul2021 /bin/bash
root@5455c46820cf:/# monetdbd start /usr/local/var/monetdb5/dbfarm
root@5455c46820cf:/# monetdb create demo -p monetdb
created database with password for monetdb user: demo
...