4

What are the advantages of saving file in .pkl format over .txt or .csv format in Python?

dumbledad
  • 16,305
  • 23
  • 120
  • 273
Tanvi Mirza
  • 799
  • 2
  • 8
  • 14
  • 1
    Is your question about https://docs.python.org/3/library/pickle.html? –  Mar 19 '18 at 20:47
  • My question is what is the advantages of .pkl file over .txt & .csv. I got a code from client where some of the dictionaries & data frames are saved in .pkl format..pkl file contains around 30,000,000.I would like to save the data in .txt format.As I'm going to use the data in pyspark & I'm not finding way to read .pkl file in pyspark.So if I can get the benefits of .pkl file over .txt then I will do extensive search for reading .pkl file in pyspark otherwise I will save the data in .txt format & will read it using pyspark – Tanvi Mirza Mar 20 '18 at 10:19
  • See also [What is the difference between save a pandas dataframe to pickle and to csv?](https://stackoverflow.com/q/48770542/575530) – dumbledad Jun 18 '19 at 10:07
  • Matthew Rocklin does an analysis [here](http://matthewrocklin.com/blog/work/2015/03/16/Fast-Serialization) which suggests that CSV is often faster! – dumbledad Jun 18 '19 at 10:11

2 Answers2

1

.pkl can serialize a very wide range of objects, not just text data.

krflol
  • 1,105
  • 7
  • 12
  • Can you please elaborate it by giving some examples?I'm seeing a python code where data frame & dictionary are saved in .pkl format.I'm checking why do we need to save these in .pkl file, why not in .txt or .csv file – Tanvi Mirza Mar 19 '18 at 19:32
  • Using pickle will save the dataframe as a dataframe object, rather than exporting it. It can be faster, and if you intend to load it as a dataframe later, there is no real reason to go from df->csv->df. Saving to csv, for example, will require you specify the index label for the dataframe. Not a big deal, but by using pickel you're serializing the dataframe as a dataframe. – krflol Mar 19 '18 at 19:39
1

Although you only ask about the advantages, I would like to mention the disadvantages first.

Con:

  • Csv file is more popular.
  • pkl might have security problem. see here

Pro:

  1. PKL is faster.

  2. PKL can store any binary subject.

    JSON is a text serialization format (it outputs unicode text, although most of the time it is then encoded to utf-8), while pickle is a binary serialization format;

Luk Aron
  • 1,235
  • 11
  • 34
  • pkl files are likely to take much more space on disk than the corresponding txt or JSON files as well. https://stackoverflow.com/questions/30253976/pickling-pandas-dataframe-multiplies-by-5-the-file-size – Rafael Mar 02 '23 at 09:59