Specifying output directory in vaex.from_csv()

Question

I am using Python's Vaex library in a Kaggle notebook to convert a .csv dataset to .hdf5 using the vaex.from_csv() method. I am unable to find a way to specify the output directory for the hdf5 file.

The method creates the file in the same directory as the input file by default which Kaggle blocks as it designates a unique folder for creating files. Copying the dataset to the output destination and then converting is not an option due to space limitations.

Neither Googling nor stackoverflow nor the docs have been fruitful. The method does have parameters for fs_options and fs which maybe related to this however, I am uncertain of what they are as I could not find an explanation for them.

Link to method docs: https://vaex.readthedocs.io/en/latest/api.html#vaex.from_csv

Assistance with this will be appreciated.

score 0 · Answer 1 · answered Aug 04 '22 at 21:17

from_csv is just a method to read a csv file with vaex.

So you should do something like

df = vaex.from_csv('path/to/file.csv')

Then once you have the dataframe loaded, you can do:

df.export_hdf5('path/to/converted_file.hdf5')

If you are confused about the convert kwarg in from_csv, it says you can pass a string, which should be interpreted as the path of the converted file.

Specifying output directory in vaex.from_csv()

1 Answers1