0

I know read csv file in datatable is much faster than pandas DataFrame.

However, in my case

I have several csv files and i have to append one by one all of them.

So i am doing append all of these pd.read_csv(file) to empty DataFrame.

Will it be faster read csv file with datatable and append it to empty datatble

and then finally convert final datatable to csv?

So i want to know the fastest way to append csv file except pandas DataFrame

sammywemmy
  • 27,093
  • 4
  • 17
  • 31
MCPMH
  • 175
  • 1
  • 1
  • 11
  • Kindly share code of how you do it in datatable vs pandas, so we can better understand what you mean and suggest possible solutions. Also, kindly change the tag to py-datatable – sammywemmy Oct 26 '21 at 19:20
  • Also, you can read in all the files with either `fread` or `iread` (lazy reading of multiple files), and combine them into one with `dt.rbind`. Again, sharing your code, or some sample dataframes would be helpful. Not sure if this [blog post](https://samukweku.github.io/data-wrangling-blog/python/pydatatable/pandas/2020/11/05/Read-Multiple-Csv-Files-into-one-Table-in-Python.html#Via-iread) I authored will be helpful. – sammywemmy Oct 27 '21 at 00:13

1 Answers1

1

This is what I do when I have lots of csv files.

I use glob to grab all the csv file paths:

from glob import glob
all_csvs = glob('path-to-folder-containing-csv-files/*.csv')

Now read all of them and append them.

all_csvs_appended = dt.rbind(iread(all_csvs))

If all your csv files do not have the same columns, you may need to add force=True to rbind.

Kay
  • 2,057
  • 3
  • 20
  • 29