I'm testing out feather-format as a way to store pandas DataFrame files. The performance of feather seems to be extremely poor when writing columns consisting entirely of None (info() gives 0 non-null object). The following code well encapsulates the issue:
df1 = pd.DataFrame(data={'x': 1000*[None]})
%timeit df1.to_feather('.../x.feather')
5.35 s ± 303 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df1.to_pickle('.../x.pkl')
734 ms ± 60.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df1.to_parquet('.../x.parquet')
200 ms ± 5.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I'm using feather-format 0.4.0, pandas 0.23.4, and pyarrow 0.13.0.
How can I get these kinds of DataFrames to save without taking forever?