I export a large Data Frame (18 million observations; 5 columns) called SalesData to Stata native file format using pandas to_stata:
SalesData.to_stata(sales)
It works but it is extremely slow to the point it is not usable in production. I think I understand why: as shown by an examination of the resulting Stata file, every string column is assigned by pandas a width of 244 characters regardless of the actual content of the column --> the Stata file is needlessly huge. A "compress" command in Stata on the said file reduces its size by a factor a 10, without any data loss.
I don't seem to be able to locate any options to the to_stata method to control for this behaviour.
Any suggestions? Thanks