1

I would like to write and later read a dataframe in Python.

df_final.to_csv(self.get_local_file_path(hash,dataset_name), sep='\t', encoding='utf8')
...
df_final = pd.read_table(self.get_local_file_path(hash,dataset_name), encoding='utf8',index_col=[0,1])

But then I get:

sys:1: DtypeWarning: Columns (7,17,28) have mixed types. Specify dtype option on import or set low_memory=False.

I found this question. Which in the bottom line says I should specify the field types when I read the file because "low_memory" is deprecated... I find it very inefficient.

Isn't there a simple way to write & later read a Dataframe? I don't care about the human-readability of the file.

Mike Müller
  • 82,630
  • 20
  • 166
  • 161
Guy s
  • 1,586
  • 3
  • 20
  • 27

2 Answers2

1

You can pickle your dataframe:

df_final.to_pickle(self.get_local_file_path(hash,dataset_name))

Read it back later:

df_final = pd.read_pickle(self.get_local_file_path(hash,dataset_name))

If your dataframe ist big and this gets to slow, you might have more luck using the HDF5 format:

df_final.to_hdf(self.get_local_file_path(hash,dataset_name))

Read it back later:

df_final = pd.read_hdf(self.get_local_file_path(hash,dataset_name))

You might need to install PyTables first.

Both ways store the data along with their types. Therefore, this should solve your problem.

Mike Müller
  • 82,630
  • 20
  • 166
  • 161
0

The warning is because Pandas has detected conflicting Data values in your Column. You can specify the datatypes in the DataFrame Constructor if you wish.

,dtype={'FIELD':int,'FIELD2':str} 

Etc.

Tim Seed
  • 5,119
  • 2
  • 30
  • 26