4

Beginner question: I have a dictionary where the values are lists of (a variable # of) strings. Ultimately, I would like to write each dictionary entry to a single tab-delimited line with the key as column 1 and the individual items from the value list as columns 2-n. I have used the following code to generate an output file with the key as column 1 and the value list as column 2, but I'm not sure how to proceed from there.

mydict = {'spider':['kate', 'susan'],
          'fish':['kate'],
          'dog':['andy'], 
          'cat':['andy','colby','jeff']} 
f = open('outfile.txt', 'w') 
writer = csv.writer(f, delimiter = '\t')
for key, value in orfdict.iteritems():
    writer.writerow([orf] + [value])

The python documentation suggests that you can use zip() to create a list form key:value pairs, but when I try this at the interactive prompt:

>>> for key,value in mydict.iteritems():
...     mypair = zip(key,value)
...     print mypair

I get this strange output, so I obviously I'm not understanding things:

[('f', 'kate')]
[('c', 'andy'), ('a', 'colby'), ('t', 'jeff')]
[('s', 'kate'), ('p', 'susan')]
[('d', 'andy')]

Is the simplest way to do this going to be creating an empty list for each iteration over the dictionary, then appending to that list first the key, and then each of the values with an indented for loop? I feel like I must be missing something.

juliomalegria
  • 24,229
  • 14
  • 73
  • 89
pandaSeq
  • 213
  • 2
  • 6
  • 12
  • This is what JSON, XML and other structured document formats are designed for. Why do you need to use a tab-delimited csv? – John Lyon Apr 02 '12 at 23:02
  • The "strange output" of zip is explained by noting that taking a string in a context that expects a sequence yields the individual characters of the string; that is, a string acts like a list of characters. So zipping 'cat' with ['andy', 'colby', 'jeff'] breaks cat into ['c', 'a', 't']. – Russell Borogove Apr 03 '12 at 00:19
  • @jozzas - because I don't know anything about those filetypes yet...but I've added them to my "to learn" list. Thanks! – pandaSeq Apr 03 '12 at 15:42

4 Answers4

5

Try this to add a single value to an existing list:

writer.writerow([key] + value)

(key is a single value, value is already a list)

poke
  • 369,085
  • 72
  • 557
  • 602
  • Got it, thanks! I didn't really understand the function of the square brackets, but this makes sense. – pandaSeq Apr 03 '12 at 15:40
  • The square brackets basically create a list. So `[1,2,3]` creates a list with the items `1`, `2`, and `3`. Just like that `[key]` creates a list with a single item: `key`. And then you concat that created list, with the already existing one. – poke Apr 03 '12 at 17:08
2

It looks like you renamed some of your variables, and didn't rename others, I'm assuming you meant for your example code to read:

mydict = {'spider':['kate', 'susan'],
          'fish':['kate'],
          'dog':['andy'], 
          'cat':['andy','colby','jeff']} 
f = open('outfile.txt', 'w') 
writer = csv.writer(f, delimiter = '\t')
for key, value in mydict.iteritems():
    writer.writerow([key] + [value])

csv writer seems unnecessary in this case, why not use:

mydict = {'spider':['kate', 'susan'],
          'fish':['kate'],
          'dog':['andy'],
          'cat':['andy','colby','jeff']}
f = open('outfile.txt', 'w')
for key, value in mydict.iteritems():
    f.write('%s\t%s\n' % (key,'\t'.join(value)))
f.close()
  • Hi Keith, This is along the lines of things I have tried with no success. In this case, this code gives me the error:Traceback (most recent call last): File "/Users/zuma/scripts/stackoverflow.py", line 9, in f.write('%s\t%s\n') % (key,'\t'.join(value)) TypeError: unsupported operand type(s) for %: 'NoneType' and 'tuple' – pandaSeq Apr 03 '12 at 15:36
  • I had two errors that I would have identified had I tested the code, my apologies -- the code is now fixed. – Keith Schoenefeld Apr 03 '12 at 17:56
  • In fairness to the accepted answer and your use of the csv library, it will deal with things properly if the values in your dict object contain the same value as the delimeter. In other words, if 'kate' were 'kate\tand\tjim', my code would result in three columns when it should just contain one, whereas the csv response would properly result in a single column for 'kate\tand\tjunk', but would wrap it in double quotes. I point this out because it makes the response that leveraged csv more correct and it also adds something you need to ensure you look for when reading the resulting csv file. – Keith Schoenefeld Apr 03 '12 at 18:15
0

Try changing your for loop to the following:

for key, value in orfdict.iteritems():
    writer.writerow([key] + value)

Because the values in orfdict are lists, in each iteration value will be a list. For example in the first iteration key could be 'spider', and value would be ['kate', 'susan'], so [key] + value would become ['spider'] + ['kate', 'susan'] or ['spider', 'kate', 'susan'].

Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
0
>>> [(x, y) for x, x2 in mydict.iteritems() for y in x2]
[('fish', 'kate'), ('cat', 'andy'), ('cat', 'colby'), ('cat', 'jeff'), ('spider', 'kate'), ('spider', 'susan'), ('dog', 'andy')]
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358