0

I have a 1M and 10M rows tables (10 columns) I am uploading to a postgres DB using pandas.to_sql.

I noticed that using method="multi" severely reduces performance, 5 to 6x with chuncksizes 5k, 10k, 20k. even reducing chunksize to 100 didnt help.

Is this normal with postgres? 10 columns is not a lot for the performance to be that affected.

Courvoisier
  • 904
  • 12
  • 26
  • It doesn't look normal to me, I can't reproduce it. But why do you care? If you already know what is fastest, just use that. And why would number of columns make any difference? – jjanes Jul 19 '22 at 13:15
  • normally large number of columns make `multi`'s performance bad. Also, I read that `multi` should make the inserts super fast. I just want to know what is the best config. – Courvoisier Jul 19 '22 at 22:43
  • You are correct, it does degrade quickly as the number of columns increases, due to some really bad implementation in pandas. But with PostgreSQL (contrary to the pandas docs) None doesn't insert one row at a time. It does the same thing that 'multi' does, except without the bad implementation. However with None, chunksize >1000 appears to be treated as 1000. – jjanes Jul 20 '22 at 21:45
  • ah, that's why there is no noticeable differences when i used different chunksizes above 1000. – Courvoisier Jul 21 '22 at 14:34

0 Answers0