I am trying to import a 30 Gb csv file and convert it to HDF5 through vaex with the following code. I read that setting convert to true would prevent an OutOfMemory error, although I continue to get the error after nearly 30 minutes of trying to load the data.
import vaex
vaex.from_csv("combined.csv", convert=True, chunk_size=5_000_000)
I get the following error:
MemoryError: Unable to allocate 26.1 GiB for an array with shape (701, 5000000) and data type object
When looking at the vaex FAQ and their documentation, this seems like the best and only way to deal with such a large csv file. Am I missing something, or is there a better way to do this?