I have a loop that goes through each file (xlsx and csv) on a directory, reads them, append them and concatenate it in list. That list is them sent to a database. It works, but hen it comes to bigger files, it can take a lot of time. I figure this is the part where I need to improve my code:
xls.to_sql(table, con=engine, if_exists='append', index=False)
I've been testing different chunksizes, but how can I choose the optimal chunksize when there's all kinds of files' sizes?
I've also been studying parallelization and was wondering if anyone could help me figure out how to improve this.
EDIT: By the way, I've also followed this blogpost to improve my connection to the database and all. And I've noticed a lot of perfomance improvements. But to_sql is still lacking performance.