What are the advantages of saving file in .pkl format over .txt or .csv format in Python?
Asked
Active
Viewed 6,528 times
4
-
1Is your question about https://docs.python.org/3/library/pickle.html? – Mar 19 '18 at 20:47
-
My question is what is the advantages of .pkl file over .txt & .csv. I got a code from client where some of the dictionaries & data frames are saved in .pkl format..pkl file contains around 30,000,000.I would like to save the data in .txt format.As I'm going to use the data in pyspark & I'm not finding way to read .pkl file in pyspark.So if I can get the benefits of .pkl file over .txt then I will do extensive search for reading .pkl file in pyspark otherwise I will save the data in .txt format & will read it using pyspark – Tanvi Mirza Mar 20 '18 at 10:19
-
See also [What is the difference between save a pandas dataframe to pickle and to csv?](https://stackoverflow.com/q/48770542/575530) – dumbledad Jun 18 '19 at 10:07
-
Matthew Rocklin does an analysis [here](http://matthewrocklin.com/blog/work/2015/03/16/Fast-Serialization) which suggests that CSV is often faster! – dumbledad Jun 18 '19 at 10:11
2 Answers
1
.pkl can serialize a very wide range of objects, not just text data.

krflol
- 1,105
- 7
- 12
-
Can you please elaborate it by giving some examples?I'm seeing a python code where data frame & dictionary are saved in .pkl format.I'm checking why do we need to save these in .pkl file, why not in .txt or .csv file – Tanvi Mirza Mar 19 '18 at 19:32
-
Using pickle will save the dataframe as a dataframe object, rather than exporting it. It can be faster, and if you intend to load it as a dataframe later, there is no real reason to go from df->csv->df. Saving to csv, for example, will require you specify the index label for the dataframe. Not a big deal, but by using pickel you're serializing the dataframe as a dataframe. – krflol Mar 19 '18 at 19:39
1
Although you only ask about the advantages, I would like to mention the disadvantages first.
Con:
- Csv file is more popular.
- pkl might have security problem. see here
Pro:
PKL is faster.
PKL can store any binary subject.
JSON is a text serialization format (it outputs unicode text, although most of the time it is then encoded to utf-8), while pickle is a binary serialization format;

Luk Aron
- 1,235
- 11
- 34
-
pkl files are likely to take much more space on disk than the corresponding txt or JSON files as well. https://stackoverflow.com/questions/30253976/pickling-pandas-dataframe-multiplies-by-5-the-file-size – Rafael Mar 02 '23 at 09:59