Read/write CSV Array of Dicts containing an list of arbitrary length

Question

I am currently writing an array of dictionaries like below to a csv File:

tmp_res = [{"val1": 1.0, "val2": 2, "ar_1": [[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]] },....]

ar1 represents an *ndarray* of arbitrary length [-1,2] and -1 is not constant in the Dicts.

After reading I get the single values of val1 and val2 as supposed however the Array is not easily readable.

"[[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]"

I know I could work through that string and seperate it by some characters. However it feels like there should be an better and more elegant solution to this.way to solve this problem.

What is the best way to save such Data to a file and restore it?

EDIT: To clarify my saving and reading of the file. I am saving my file via a csv.DictWriter in the following way:

# Exemplary Data:
results = [{'mean_iou': 0.3319194248978337, 'num_boxes': 1, 'centroids': [[101.21826171875, 72.79462432861328]]}, {'mean_iou': 0.4617333142965009, 'num_boxes': 2, 'centroids': [[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]}, {'mean_iou': 0.537150158582514, 'num_boxes': 3, 'centroids': [[50.82071304321289, 42.616580963134766], [304.91583251953125, 176.09994506835938], [140.43699645996094, 104.00206756591797]]}]

# The given results data is basically tmp_res after the for loop.
tmp_res = []
for i in range(0, len(results):
    res_dict = {}
    res_dict["centroids"] = results[i]["centroids"]
    res_dict["mean_iou"] = results[i]["mean_iou"]
    res_dict["num_boxes"] = results[i]["num_boxes"]
    tmp_res.append(res_dict)

# Writing to File
keys = tmp_res[0].keys()
with open('anchor.csv','w+') as output_file:
    dict_writer = csv.DictWriter(output_file, keys)
    dict_writer.writeheader()
    dict_writer.writerows(tmp_res)

# Reading from File

  num_centroids = []
  mean_ious = []
  centroids = []
  reader = csv.DictReader(csvfile,
                          fieldnames=["mean_iou",
                                      "num_boxes",
                                      "centroids"])
        # Skipping line of the header
        next(reader, None)
        for row in reader:
            centroids.append(row["centroids"])
            num_centroids.append(row["num_boxes"])
            mean_ious.append(row["mean_iou"])

An exerpt from the file looks like follows:

mean_iou,num_boxes,centroids

0.3319194248978337,1,"[[101.21826171875, 72.79462432861328]]"

0.4617333142965009,2,"[[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]"

0.537150158582514,3,"[[50.82071304321289, 42.616580963134766],  [304.91583251953125, 176.09994506835938], [140.43699645996094, 104.00206756591797]]"

0.5602804262309611,4,"[[49.9361572265625, 41.09553146362305], [306.10711669921875, 177.09762573242188], [88.86656188964844, 167.8087921142578], [151.82627868652344, 81.80717468261719]]"

I suspect that the csv.DictWriter doesn't know how to handle an array of multiple values, since it contains an Comma, which would break the format of the comma seperated values. Therefore it wraps the Data into a string to avoid the conflict in strucutre.

While reading through Serges answer and your comments I think that using a JSON structure instead of CSV is more functional for what I am looking. It supports the structures I am looking for quite easily.

However I thought the csv.dictWriter would be able to handle some sort of unwrapping of its own "to-string-wrapped" data.

Also sorry for the delay.

Solution: Solution from Serge applied in the code:

#Added Json
import json
# Reading from File

num_centroids = []
mean_ious = []
centroids = []
reader = csv.DictReader(csvfile,fieldnames=["mean_iou",
                                            "num_boxes",
                                            "centroids"])

# Skipping line of the header
next(reader, None)
for row in reader:
    centroids.append(json.loads(row["centroids"]))
    num_centroids.append(row["num_boxes"])
    mean_ious.append(row["mean_iou"])

Please show the code you use to save and load the file, and maybe also an example of what is in the file. — MB-F, Nov 06 '18 at 15:04
`tmp_res = [{"val1": 1.0, "val2": 2, "ar_1": [[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]] },....]` is not typical for a csv-file (_comma/character separated values_), but rather looks like a text file. Please improve the quality of your question. — albert, Nov 06 '18 at 15:04
@albert this must be the Python structure which is written to the csv-file :) — MB-F, Nov 06 '18 at 15:06
So you content with result save string type instead of list of lists? If so just eval string( dangerous but easy) ore use a split — Serge, Nov 06 '18 at 15:31
@albert Shouldn't it be possible to save e.g. an list of dicts which contains a tuple to a csv file with the `csv.DictWriter` and extract with a `csv.DictReader`? I have to admit it might not be the best idea, since it kind of breaks the csv structure having more than one comma for a column. However the name kind of suggests it should be able to do that. — Twald, Nov 06 '18 at 21:59

Serge · Accepted Answer · 2019-04-08T15:59:11.733

Your file is not in csv format, it is just a python dictionary. Just read file into a string and use eval statement (dangerous but easy) or write custom Parser, say, break the string onto array, remove comas and brackets, apply np.fromstring then reshape.

Curiosly "[[65.41156005859375, 53.709598541259766], ..." seems like a valid json, so np.array( json.loads ( "[[65.41156005859375, 53.709598541259766], [251.97698974609375, 153.14926147460938]]" )) should result in a ndarray. Mind that tmp_res = is not valid json, sojson.load('myfile') will fail

PS. CSV is intended for tabular data only, not multidimential data. If you must you can do double csv with standard csv and split

s = "[[76 ... "
lines = s.split(']], [[')

reader = csv.reader(lines, delimiter=', ')

or use panda from_csv you can define ]], [[ as lineseparator in C mode.

I guess, a better solution is storing the data in valid json (without any assignments). Or you can try to use designated numpy.save numpy.load to store binary data for greater scalability.

For other viable alternatives read

How can I serialize a numpy array while preserving matrix dimensions?

PS. CSV is intended to be used for tabular data, not arbitrary multidimentonal data, so it is just poor choise here. Nevertheless if you must, you can use double csv reader, though it looks ugly

text = "[[6... 
lines = text.split("]], [[")
reader2 = csv.reader(lines, delimiter=', ')
...

or you can tinker with pandas csv reader it even has custom line delimiter. Perhaps some more powerful csv libraries would work better.

I think using the valid JSON is the better alternative than using a `csv.DictWriter`. Also using the json.loads function on my decoding works like a charm. However I am still wondering if there is a `csv` native way to solve this. — Twald, Nov 06 '18 at 21:40
See above: you can not solve json reading problem in an elegant way with csv. Best you can do a workaround: apply csv to times. — Serge, Dec 18 '18 at 14:02

Read/write CSV Array of Dicts containing an list of arbitrary length

1 Answers1