0

I was looking at the accepted answer of this:

Creating a dictionary from a CSV file

I want to create a dictionary in which the first row's different columns are keys with the corresponding next row's columns as values. This seems to do the trick, but I don't understand some parts of the code. This is it:

import csv reader = csv.DictReader(open('values.csv'))

result = {} for row in reader:
    for column, value in row.items():  # consider .iteritems() for Python 2
        result.setdefault(column, []).append(value)
        print(f"Column: {column}")
        print(f"Value: {value}") print(result)

When I run this code, I get:

Column: Date
Value: 123
Column: Foo
Value: 456
Column: Bar
Value: 789
Column: Date
Value: abc
Column: Foo
Value: def
Column: Bar
Value: ghi
{'Date': ['123', 'abc'], 'Foo': ['456', 'def'], 'Bar': ['789', 'ghi']}

for the file:

Date,Foo,Bar
123,456,789
abc,def,ghi

It does the job correctly, then, so that Date is the key with the values in the same column rows under it, but I don't understand how that works in code.

What does column, value in row.items() do, exactly? Does it mean for every column in the row (separated by a comma), consider that a value? What does the .items() do (I looked at the documentation, but didn't get what Returns a list containing a tuple for each key value pair meant)?

Also, what does result.setdefault(column, []).append(value) do? I know append adds a value, but what does the syntax .setdefault(column, []) mean (In documentation, it means Returns the value of the specified key. If the key does not exist: insert the key, with the specified value, which I don't get either)?

Additionally, how did the program understand that the first row is that which I want to store the keys?

I've never done Python before, and so I apologize if this is a dumb question! I just want to make a dictionary for a database, and so this seems ideal, but I want to know what each line does. Thank you in advance!

Hana Ali
  • 73
  • 1
  • 8
  • The `.setdefault(column, [])` is a way to get each key's value to be a list, which can then be appended to. `dict.setdefault` is somewhat tricksy. Nowadays it would be more idiomatic to use a `defaultdict(list)` from the [collections](https://docs.python.org/3/library/collections.html#collections.defaultdict) module to achieve the same effect. – snakecharmerb Aug 02 '20 at 16:00

1 Answers1

0

1. What does column, value in row.items() do, exactly?

To answer this, we have to look at the line reader = csv.DictReader(open('values.csv')). A DictReader returns rows of CSV as a dictionary, where the keys are (by default) the same as the first row in your CSV file (e.g. the column names). When iterating over the reader, (for row in reader:), the row is a dictionary. To get key/value pairs out of a dictionary, you have to call it's .items() method, which returns an iterator of tuples containing the key and value respectively. In your case, if we read the second line of your sample CSV, row is identical to:

row = {
   "Date": 123, 
   "Foo": 456, 
   "Bar": 789
}

So, in your first iteration of for column, value in row.items():, we assign the value "Date" to column and 123 to value. We can do something with those values, and in the next iteration of this for loop (so the next tuple that is generated by row.items()), the value "Foo" is assigned to column and 456 to value.

2. Also, what does result.setdefault(column, []).append(value) do?

Note that the variable result is an empty dictionary. This dictionary is going to hold lists of values, we want the result variable to look like this:

result = {
   "Date": [123, "abc"],
   "Foo": [456, "def"],
   "Bar": [789, "ghi"]
}

In the for-loop that we are in (looping over the row dictionary), we are now assigning the value 123 to the key "Date" in the resultdictionary. However, if we are in the first iteration, the result dictionary is still empty! What is happening here is basically the following: result.setdefault(column, []) -> this get's the value of the key column (note, this is the variable from the for-loop!) out of the dictionary called result, and if that key does not exist, then create it with an empty list and return that value (the empty list). In this case, [] is the empty list. In the first iteration (where no key/values are stored in the result dictionary), the column variable is assigned "Date", and since it doesn't exists yet in the result dictionary, it is created with value [] (empty list).

Then, chained, you call the .append(value) which takes the value (in the first iteration, this is 123) and appends it to the list.

3. Additionally, how did the program understand that the first row is that which I want to store the keys?

This is the trick of DictReader, which by default looks at your first line of the CSV and treats it as column names.

JarroVGIT
  • 4,291
  • 1
  • 17
  • 29
  • Thank you so much for thoroughly explaining! I do have a question though: Since Dictreader() automatically considers the first row the "keys," does that mean that for the tuples assignment, the column would always take the value of key? As in, why is it that Date/Foo/Bar are always assigned as the keys of the tuples, and never values? – Hana Ali Aug 02 '20 at 16:34
  • By default, that is correct. You can however give a list of custom column names as parameter when creating the DictReader. The tuple that is returned always follows the pattern `column, value` and thus you can rely that the column names are always the keys in the `result` dictionary – JarroVGIT Aug 02 '20 at 16:45