Proper way of writing and reading Dataframe to file in Python

Question

I would like to write and later read a dataframe in Python.

df_final.to_csv(self.get_local_file_path(hash,dataset_name), sep='\t', encoding='utf8')
...
df_final = pd.read_table(self.get_local_file_path(hash,dataset_name), encoding='utf8',index_col=[0,1])

But then I get:

sys:1: DtypeWarning: Columns (7,17,28) have mixed types. Specify dtype option on import or set low_memory=False.

I found this question. Which in the bottom line says I should specify the field types when I read the file because "low_memory" is deprecated... I find it very inefficient.

Isn't there a simple way to write & later read a Dataframe? I don't care about the human-readability of the file.

Mike Müller · Accepted Answer · 2017-08-21T06:39:28.890

You can pickle your dataframe:

df_final.to_pickle(self.get_local_file_path(hash,dataset_name))

Read it back later:

df_final = pd.read_pickle(self.get_local_file_path(hash,dataset_name))

If your dataframe ist big and this gets to slow, you might have more luck using the HDF5 format:

df_final.to_hdf(self.get_local_file_path(hash,dataset_name))

Read it back later:

df_final = pd.read_hdf(self.get_local_file_path(hash,dataset_name))

You might need to install PyTables first.

Both ways store the data along with their types. Therefore, this should solve your problem.

score 0 · Answer 2 · answered Aug 21 '17 at 06:36

0

The warning is because Pandas has detected conflicting Data values in your Column. You can specify the datatypes in the DataFrame Constructor if you wish.

,dtype={'FIELD':int,'FIELD2':str}

Etc.

answered Aug 21 '17 at 06:36

Tim Seed

5,119
2
30
26

Proper way of writing and reading Dataframe to file in Python

2 Answers2