0

I have folder of .txt files which is of the size of 52.6 GB. The .txt files are located in various subfolders. Each subfolder has unique labels "F","G", etc. Each subfolder has got many .txt files. I need to combine all the .txt files of each unique labels("F","G") into one single file. I tried to used vaex. But I could not find a way to do this for .txt files. Can any one please help me out?

shadow kh
  • 101
  • 2
  • 2
    hello, why write a python script to merge the files ? like list all the folders you have and read the files from each one and merge them into the required final file – OmG3r Feb 22 '21 at 20:18

1 Answers1

1

Provided the text files have csv formatted data, and same structure across files, you could use:

df = vaex.open_many([fpath1, fpath2, ..., fpathX])

To fetch all the filenames and their paths, you could conveniently use pathlib to recursively glob the file paths

from pathlib import Path

txt_files = Path('your_label_folder_path').rglob('*.txt')

# since this returns a generator and vaex.open_many expects a list 
# and while we're here, resolve the absolute path as well
txt_files = [txt.absolute() for txt in txt_files]

df = vaex.open_many(txt_files)
radupm
  • 108
  • 6