0

How can I use the same keys, but have different values in a dictionary?

I have a table of Jeopardy questions:

Show Number  Air Date    Round      Category         Value   Question    Answer
4680        12/31/2004  Jeopardy!   HISTORY          $200   Question 1  Copernicus
4680        12/31/2004  Jeopardy!   ESPN             $200   Question 2  Jim Thorpe
4680        12/31/2004  Jeopardy!   EVERYBODY TALKS  $200   Question 3  Arizona
4680        12/31/2004  Jeopardy!   THE COMPANY LINE $200   Question 4  McDonald's
4680        12/31/2004  Jeopardy!   EPITAPHS         $200   Question 5  John Adams

(Note: this is stored in a csv file. I just tried to show its layout above. It's available here, fyi)

And basically, I'm trying to get a list/dictionary/return that has a header matched with a question, something like a variable that holds:

['Show Number':4680, 'Air Date': '12/31/2004', ...'Answer':'Copernicus']
['Show Number':4680, 'Air Date': '12/31/2004', ...'Answer':'Jim Thorpe']
['Show Number':4680, 'Air Date': '12/31/2004', ...'Answer':'Arizona']

So later on, I can parse through that dictionary(?) and do things like get the unique values based on Category, Value, etc. Would it be ..a list of a dictionary??

I tried making it a dictionary - and it doesn't work. It only returns the last row's data. I understand why, because each time the row changes, it just starts back and updates the same keys with new info.

import csv

file_name = 'JEOPARDY_CSV.csv'

def get_data(csv_file):
    data = []
    with open(csv_file, 'r',  encoding="utf8") as read:
        reader = csv.reader(read)
        all_data = list(reader)
        data = all_data[1:]
        headers = all_data[0]
    return data, headers

def create_dict(data, headers):
    i = 0
    data_dict = {}
    for row in data:
        for col in row:
            data_dict[headers[i]] = col
            i+=1
        i = 0
    print(data_dict)

def main():
    file_data, headers = get_data(file_name)
    data_dictionary = create_dict(file_data[0:5], headers)

if __name__ == "__main__":
    main()

Again, the idea is to later on, have a function I can run to do things based on column header, like "return all questions where show number is 4680", or "for all categories, return the unique ones".

BruceWayne
  • 22,923
  • 15
  • 65
  • 110

2 Answers2

1

If some combination of columns uniquely identifies rows in this dataset (the primary key in relational database theory), you should include all of those columns in the dictionary's key. Searching on a key will be fast.

Alternatively, you can store non-unique data in a list of rows (list of dictionaries). Searching for a value will require looping through all rows in the list.

René Pijl
  • 4,310
  • 1
  • 19
  • 25
  • Aha, I think I see what you mean. Would that make this question a duplicate of [this question, perhaps](https://stackoverflow.com/questions/14091387/creating-a-dictionary-from-a-csv-file)? – BruceWayne Oct 27 '17 at 07:52
  • In that question he says "I would like the first row of the CSV file to be used as the 'key' field for the dictionary". That appears to be a different data structure. – René Pijl Oct 27 '17 at 08:00
  • Hm then maybe I misunderstood your first point. Can you link me to an example, or maybe some mock code to show what you mean? It sounds promising! – BruceWayne Oct 27 '17 at 08:03
1

Your current approach won't split the columns as you expected.
Another moment that csv.reader expects comma , as default delimiter. The columns in your file are delimited with arbitrary number of whitespaces. It's obvious that there should be another way to achieve the goal.

I would recommend pandas solution for such case:

import pandas as pd

file_name = 'JEOPARDY_CSV.csv'

def get_data(csv_file):
   df = pd.read_csv(file_name, sep='\s{2,}', engine='python', header=0)
   data = list(df.T.to_dict().values())
   return data

print(get_data(file_name))

The output is the needed list of dictionaries:

[{'Question': 'Question 1', 'Value': '$200', 'Air Date': '12/31/2004', 'Answer': 'Copernicus', 'Category': 'HISTORY', 'Show Number': 4680, 'Round': 'Jeopardy!'}, {'Question': 'Question 2', 'Value': '$200', 'Air Date': '12/31/2004', 'Answer': 'Jim Thorpe', 'Category': 'ESPN', 'Show Number': 4680, 'Round': 'Jeopardy!'}, {'Question': 'Question 3', 'Value': '$200', 'Air Date': '12/31/2004', 'Answer': 'Arizona', 'Category': 'EVERYBODY TALKS', 'Show Number': 4680, 'Round': 'Jeopardy!'}, {'Question': "McDonald's", 'Value': 'Question 4', 'Air Date': '12/31/2004', 'Answer': None, 'Category': 'THE COMPANY LINE $200', 'Show Number': 4680, 'Round': 'Jeopardy!'}, {'Question': 'Question 5', 'Value': '$200', 'Air Date': '12/31/2004', 'Answer': 'John Adams', 'Category': 'EPITAPHS', 'Show Number': 4680, 'Round': 'Jeopardy!'}]

Going further, pandas allows you to group column values, get unique records, perform aggregations and many others ...

RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • Ah sorry, the original data is in a csv, I edited to add that note. I've heard of pandas before, I'll look in to that thanks! – BruceWayne Oct 27 '17 at 08:08