0

I'm using MonetDB 11.41.5 and I'm trying to bulk upload a very large CSV file (70 Gb) to a table using COPY INTO.

After a few minutes, I get the following message:

enter image description here

I checked the disk space and I still have more than 60 Gb available:

enter image description here

Isn't that enough to complete the bulk load?

IMPORTANT NOTE: If I use MonetDB 11.39.11 to bulk load the same large file, it finishes without any problems. What's the difference with version 11.41.5?

Thank you.

UPDATED INFO

This large CSV file contains 225 millions of lines, so I specified the number of records in COPY INTO sentence.

I started with 230 millions, but I got and error. Then, I followed with smaller figures until 80 millions, that seemed to work fine!

enter image description here

So, the problem here is disk space? How much more do I need to acomplish the load and why version 11.39.11 doesn't behave in the same way?

New test with 640 Gb of disk space

enter image description here

enter image description here

Llorieb
  • 25
  • 5

1 Answers1

0

With the Jul2021 release, MonetDB's transaction layer got a complete overhaul. Generally the new algorithm has better and more stable performance but a side effect of the change is that COPY INTO currently needs more scratch space while loading.

In a trial I ran loading a 65GB csv file, on Oct2020 the database directory grew to 38G and remained there. On Jul2021, the size went up and down to a maximum of 58G and eventually settled on 41G. The difference between 38G and 41G is probably due to scratch space that will eventually be released.

Based on these numbers it sounds entirely plausible that 60G of free space is not sufficient for loading a 70G file. Possible solutions are,

  1. find more disk space, or
  2. load the data in smaller batches.

If you're on a Unix-like system such as Linux or macOS, the split utility may come in handy for option 2.

UPDATED ANSWER

Indeed, 650G of free space ought to be enough. This may be a bug to be reported at the bugtracker at https://github.com/MonetDB/MonetDB/issues. Fortunately, your problem is reproducible. However, 70G of data is a bit much for a reproducible testcase.

  1. Could you try if the problem also occurs if you have 225M instances of the same line instead of 225M different lines? Then you can provide in the bug report that single line, plus the CREATE TABLE statement for the table, plus you exact COPY INTO command.

  2. It might also be useful to enable some debug tracing. You can do it like this:

DROP TABLE IF EXISTS foo;
CREATE TABLE foo(i INT, j INT);

CALL logging.setcomplevel('HEAP', 'DEBUG');
CALL logging.setflushlevel('DEBUG');

COPY INTO foo FROM '/tmp/jvr/stackoverflow.csv';

SELECT COUNT(*) FROM foo;

The traces end up in the file mdbtrace.log in the database directory. Maybe this will shed some light on that's happening.

Finally, I'm curious whether the problem also occurs on newer MonetDB releases. Could you try the images docker/dev-builds:Jul2021 and docker/dev-builds:Jan2022? They are currently really rough around the edges, you have to run bash inside the container and start MonetDB manually. Also, you cannot access it from outside:

ยป docker run -p 127.0.0.1:50000:50000 -ti monetdb/dev-builds:Jul2021 /bin/bash
root@5455c46820cf:/# monetdbd start /usr/local/var/monetdb5/dbfarm 
root@5455c46820cf:/# monetdb create demo -p monetdb
created database with password for monetdb user: demo
...
  • Thank you for the response! I've tried in a different server, with 640 Gb of disk space available, but I got the same message. You can see the images in "New test with 640 Gb of disk space", within the message body. Then I've tried to upload manually with smaller batches of 80 millions lines each. The first one works perfectly, but the second produces a very strange message: "Append failed". And that's it! I think there's a deeper problem here. My MonetDB is inside a Docker container. Could that be the reason of this issue? โ€“ Llorieb Dec 16 '21 at 00:49
  • I've updated the answer with some more suggestions โ€“ Joeri van Ruth Dec 17 '21 at 11:40
  • Thank you. Finally, I reported this issue with number #7213. If you need more information, please feel free to ask. โ€“ Llorieb Dec 28 '21 at 13:58