3

I would like to write a list of Python dictionaries into a file. However, I need the dictionaries (and lists within) to remain dictionaries, i.e. when I load the file for processing I want them to use dictionaries and not have to work with strings.

Here is my sample code which write the data as strings, is there a way to retain the origin Python data structures (in real code the list data has hundreds of dictionaries, each of which may have hundreds of lists as values). I cannot simple pickle the data, for a number of reasons (one of which is the file needs to be human readable).

import csv
import pandas as pd

def write_csv_file(data, iteration):
    with open('%s.csv' % 'name', 'wb') as data_csv:
        writer_data = csv.writer(data_csv, delimiter=',')
        for d in data:
            writer_data.writerow([iteration] + [d])


data = [{'a':1, 'b':2}, {'e':[1], 'f':[2,10]}]
iteration = 1
write_csv_file(data, iteration)

At the moment I read the data file using pandas in the following manner to process the data.

d = pd.read_csv('name.csv')
d = pd.DataFrame(d)
user58925
  • 1,537
  • 5
  • 19
  • 28
  • That is **impossible**. Everything you *write to* or *read from* a file is a `string`. You will have to figure a clever way to go from `dict` to `string` and vice versa. Or use some kind of library. – Ma0 Dec 20 '17 at 16:32
  • 1
    You need some form of serialisation, e.g. `pickle` or `json`. – Norrius Dec 20 '17 at 16:33
  • 3
    @Ev.Kounis that's a stretch. How do you define string? To me, it's a human readable sequence of characters. However, files deal with bytes. You can write any sequence of bytes to a file, for example a serialized dictionary created with `pickle`. – timgeb Dec 20 '17 at 16:33
  • 1
    Ev. Kounis: Everything you write or read to a file is a byte sequence, not a text string. And Python have several ways to serialize and deserialize its data structures to byte sequences, which are clever enough. – jsbueno Dec 20 '17 at 16:34

3 Answers3

5

Just use pickle instead of CSV to write your data to a file https://docs.python.org/3/library/pickle.html

import pickle

def write_csv_file(data):

    with open('%s.pickle' % 'name', 'wb') as data_file:
          pickle.dump(data, data_file)

Pickle will correctly serialize and recover a whole host of data types, including date-times and most user-defined classes out of the box.

However, if you will need to to manually edit the files with third party tools, or want it to be human readable, it might not be the best choice.

If you only need numbers, None, Booleans, lists and dictionaries, and would prefer a human-readable text file, then JSON can be a good choice. Python's json module uses the same inteface as picle, with the dump and load callables to write and read to a file. On the code snippet above, just replace pickle by json and it will work the same with the prescribed data types. Moreover,check the docs so that the json serialization output is padded with nice indentation in order to be truly readable: https://docs.python.org/3/library/json.html

jsbueno
  • 99,910
  • 10
  • 151
  • 209
3

I think what you try to do is data serialization. I think one of most common and well known serialization format is JSON. And there is a python module to read and write json files called json

Here is a sample function to write dump data into json file (similar to the one written by @jsbueno in another answer)

import json

def write_json_file(data):
     with open('%s.json' % 'name', 'wb') as data_file:
          json.dump(data, data_file)
running.t
  • 5,329
  • 3
  • 32
  • 50
1

Starting with Python 2.6 you can use ast.literal_eval.

import ast
ast.literal_eval('{"a":1, "b":2, "c":3}')
{'a': 1, 'b': 2, 'c': 3}

If whole column of your pandas DataFrame is a dict, then you can save it to CSV normaly (with separator other than ,) and then map this dictionary-like column strings into dictionaries:

df['DICTIONARY_COLUMN'].map(ast.literal_eval)

Naturally you can ignore dataframe part - you can transform them also in loop or whatever way you like. Important parts are: ast.literal_eval and using non-comma separator (because you use commas in dict-like strings).

jo9k
  • 690
  • 6
  • 19