-2

I have this data set:

Epitope,ID,Frequency,Assay
AVNIVGYSNAQGVDY,123431,27.0,Tetramer
DIKYTWNVPKI,887473,50.0,3H
LRQMRTVTPIRMQGG,34234,11.9,Elispot
AVNIVGYSNAQGVDY,3456,67.0,Tetramer

I would like to know how to obtain and output like this

d = {'AVNIVGYSNAQGVDY': [ID[123431,3456],Frequency[27.0,67.0],Assay['Tetramer']], 'DIKYTWNVPKI': [ID[887473],Frequency[50.0],Assay['3H']], 'LRQMRTVTPIRMQGG': [ID[34234],Frequency[11.9],Assay['Elispot']]}

This makes dictionary with every unique Epitope as key and their values are list with each category ID, Frequency and Assay as a list that have that appends the values for repetitions as you can see.

I can read the file with this code:

result = {}
for row in reader:
    dictlist = []
    key = row.pop('Epitope')
    if key in result:
        pass
    result[key] = row
print result

but I am not sure how to handle the duplicates, I mean, how to append the ID, Frequency and Assay if there is a replicate.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Why did you [ask a question](http://stackoverflow.com/questions/25001269/open-csv-file-in-python-to-customize-dictionary), accept one answer, but then use code from a deleted answer instead? You're getting `'123431'`, not `ID[123431]`, etc. for your values because you didn't use the code that showed you how to get what you want. – abarnert Jul 28 '14 at 21:15
  • Also, what exactly is `ID[123431,3456]` supposed to mean? Is `ID` a 2-dimensional NumPy array or something? – abarnert Jul 28 '14 at 21:16

1 Answers1

1

You'll need to use lists as values and append to each list, per key in the row:

from collections import defaultdict

result = defaultdict(lambda: defaultdict(list))

for row in reader:
    epitope = row.pop('Epitope')
    entry = result[epitope]
    for key, value in row.items():
        entry[key].append(value)

Demo:

>>> from collections import defaultdict
>>> import csv
>>> from collections import defaultdict
>>> sample = '''\
... Epitope,ID,Frequency,Assay
... AVNIVGYSNAQGVDY,123431,27.0,Tetramer
... DIKYTWNVPKI,887473,50.0,3H
... LRQMRTVTPIRMQGG,34234,11.9,Elispot
... AVNIVGYSNAQGVDY,3456,67.0,Tetramer
... '''
>>> reader = csv.DictReader(sample.splitlines())
>>> result = defaultdict(lambda: defaultdict(list))
>>> for row in reader:
...     epitope = row.pop('Epitope')
...     entry = result[epitope]
...     for key, value in row.items():
...         entry[key].append(value)
... 
>>> from pprint import pprint
>>> for key, value in result.items():
...     print key, dict(value)
... 
AVNIVGYSNAQGVDY {'Frequency': ['27.0', '67.0'], 'Assay': ['Tetramer', 'Tetramer'], 'ID': ['123431', '3456']}
DIKYTWNVPKI {'Frequency': ['50.0'], 'Assay': ['3H'], 'ID': ['887473']}
LRQMRTVTPIRMQGG {'Frequency': ['11.9'], 'Assay': ['Elispot'], 'ID': ['34234']}
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • It worked fine, thank you. But how do I access to the values? I mean for example how do I get the 27.0 from the Frequency? – Sebastian Carrasco Jul 28 '14 at 21:44
  • @SebastianCarrasco: `result['AVNIVGYSNAQGVDY']['Frequency']` is the list, `result['AVNIVGYSNAQGVDY']['Frequency'][0]` is the first value. – Martijn Pieters Jul 28 '14 at 21:58
  • Last question from me: I need to apply a function f() to each list in the dictionary, mean apply a function f() to result['AVNIVGYSNAQGVDY']['Frequency']. How is the loop that iterates over each list (eg ['27.0', '67.0'] in each key (eg 'ID', 'Assay') for each epitope (eg 'AVNIVGYSNAQGVDY', DIKYTWNVPKI)? Thank you so much for your time – Sebastian Carrasco Jul 28 '14 at 22:07
  • @SebastianCarrasco: in my demonstration you can already see how to loop over both keys and values; you can also loop over just the values with `dict.values()`, for example. Nest your loops. – Martijn Pieters Jul 29 '14 at 07:00