0

I am dealing with text files containing space separated values

File1.txt: text1 text2 text3 text4 ...

I would like to read the file into a vaex df and concatenate it to a so called TOTALDF vaex df.

I want to use vaex and not pandas because the df is so big that it does not fit in memory

I posted another question related to the issue here: python: How concatenate pandas dataframes with VAEX

So I have a partial solution (workaroind). I.e. creating hdf5 dataframes. But that takes time. Too much time. The question is if I can concatenate one by one a vaex df read from a csv file into this total one.

JFerro
  • 3,203
  • 7
  • 35
  • 88
  • 1
    Probably your best bet is to (assuming you are on mac/linux) use command line tools to create a single large CSV file (e.g. `$ cat cat file1.txt file2.txt file3.txt > final.txt`, then use vaex `vaex.open('final.txt')` to work with the file. – Joco Oct 23 '22 at 18:16
  • 1
    Actually.. you do not need to do what I suggested earlier. If the CSV are regular and do not need any special parsing, you can do `df = vaex.open(my_files*.csv)` and vaex will open them all lazily and concat the result. Then for performance reasons you can export them to hdf5 or whatever. – Joco Oct 25 '22 at 12:01

0 Answers0