I have a use-case where I need to concat a large number of CSV files into one, maintaining the order of the rows.
For example:
> cat file1.csv:
1,bla,bla
2,bla,bla
> cat file2.csv
2,bla,bla
2,bla,bla
3,bla,bla
> cat desired_output.txt
1,bla,bla
2,bla,bla
2,bla,bla
2,bla,bla
3,bla,bla
Currently, I'm doing this in a serial way, reading each file in sequence and appending to a single concat file (using binary mode to read/write for speedup).
Since the machine I'm using has multiple cores available, I was wondering if there's some easy way in base Python (joblib/pandas is also OK) to create some sort of aggregation tree, so that partial files are merged in parallel, with the output again being a single CSV with the rows in order.