Handling for redundancy in a list

Question

Lets say I have a list of tuples with states and counties:

stList = [('NJ', 'Burlington County'),
 ('NJ', 'Middlesex County'),
 ('VA', 'Frederick County'),
 ('MD', 'Montgomery County'),
 ('NC', 'Lee County'),
 ('NC', 'Alamance County')]

For each of these items, I want to zip the state with the county, like this:

new_list = [{'NJ': 'Burlington County'},
{'NJ': 'Middlesex County'},
{'VA': 'Frederick County'},
{'MD': 'Montgomery County'},
{'NC': 'Lee County'},
{'NC': 'Alamance County'}]

I tried something like this, but it doesn't work correctly (it iterates through each 'letter' and zips them individually):

new_list = []
for item in stList:
  d1 = dict(zip(item[0], item[1]))
  new_list.append(d1)

Returns:

 [{'N': 'B', 'J': 'u'},
 {'N': 'M', 'J': 'i'},
 {'V': 'F', 'A': 'r'},
 {'M': 'M', 'D': 'o'},
 {'N': 'L', 'C': 'e'},
 {'N': 'A', 'C': 'l'}]

To make things more complicated, my end goal is to actually have a list of dictionaries for each state(key), that has the counties(value) as a list. How can I fix the zipped dictionary and then put the counties as a list for each state?

final_list = [{'NJ': ['Burlington County', 'Middlesex County']},
{'VA': 'Frederick County'},
{'MD': 'Montgomery County'},
{'NC': ['Lee County', 'Alamance County'}]

Is there a reason you're making a list of dictionaries instead of a single dictionary? — Patrick Haugh, Sep 14 '18 at 14:51
This is my way of splicing up a very complicated problem into multiple parts essentially. So to answer your question; yes. I want to be able to easily iterate over each item later. — gwydion93, Sep 14 '18 at 16:20

score 7 · Accepted Answer · answered Sep 14 '18 at 14:51

7

You get wrong result because zip treats strings as iterables. It is expected behavior.

You may get something close to what you want like this:

result = dict()
for state, county in stList:
    result.setdefault(state, list()).append(county)

print(result)

Result is a single dictionary with lists. Output:

{'NJ': ['Burlington County', 'Middlesex County'], 'VA': ['Frederick County'], 'MD': ['Montgomery County'], 'NC': ['Lee County', 'Alamance County']}

answered Sep 14 '18 at 14:51

This solution works great and produces the exact results I was looking for. Hypothetically, speaking, if I wanted to add a third item each tuple: `[('NJ', 'Burlington County', '3/12/2018'), ('NJ', 'Middlesex County', '7/3/2011'), ('NJ', 'Burlington County', '8/13/2015')]` so that then end result would be `{'NJ': [{'Burlington County': [ '3/12/2018', '8/13/2015']},{'Middlesex County':['7/3/2011']} ]}`, how would I adjust? – gwydion93 Sep 14 '18 at 16:54
1

@gwydion93 In this hypothetic case you need one more step using the very same technique - just make method chain longer. Start for-loop with `for state, county, date in stList:`, with loop suite being `result.setdefault(state, dict()).setdefault(county, list()).append(date)`. You will get dictionary with dictionaries with lists. – Sep 14 '18 at 17:29

user2390182 · Answer 2 · 2018-09-14T19:50:51.673

3

Poolka's setdefault solution is sound, performant, and readable, but can be made even more intuitive with a defaultdict:

from collections import defaultdict

result = defaultdict(list)
for state, county in stList:
    result[state].append(county)

If there are triplets with dates in your list, you can do a nested version:

result = defaultdict(lambda: defaultdict(list))
for state, county, date in stList:
    result[state][county].append(date)

For a one-liner without any of the above mentioned attributes, you can use itertools.groupby ;)

from itertools import groupby
{k: [x[1] for x in g] for k, g in groupby(sorted(stList), key=lambda x: x[0])}

# {'NC': ['Alamance County', 'Lee County'], 
#  'MD': ['Montgomery County'], 
#  'NJ': ['Burlington County', 'Middlesex County'], 
#  'VA': ['Frederick County']}

Algorithmically, this is worse as it has to sort the initial list.

edited Sep 14 '18 at 19:50

answered Sep 14 '18 at 15:40

user2390182

72,016
6
67
89

I asked @Poolka this, but if I wanted to make add an extra `date` item to each tuple and make `[('NJ', 'Burlington County', '3/12/2018'), ('NJ', 'Middlesex County', '7/3/2011'), ('NJ', 'Burlington County', '8/13/2015')`] so that then end result would be `{'NJ': [{'Burlington County': [ '3/12/2018', '8/13/2015']},{'Middlesex County':['7/3/2011']} ]}`, how would I do this with your method? It looks like ``defaultdict` only handles for 2 items- one as `k` and the other as `v`. – gwydion93 Sep 14 '18 at 19:37
1

@gwydion93 I added the example in my answer, it is absolutely possible with a defaultdict. – user2390182 Sep 14 '18 at 19:51
OK, last comment: when I run the above, it gives me a weird output:`defaultdict(()>, {'NJ': defaultdict(list, {'Burlington County': ['3/12/2018', '8/13/2015'], 'Middlesex County': ['7/3/2011']})})` Is there a way to convert that to a regular dictionary with the `defaultdict( at 0x000001F25F24DD08>, etc...` part? – gwydion93 Sep 14 '18 at 20:06
1

@gwydion93 That is just its `repr` (string representation). For all intents and purposes, it behaves like common dict. Actually, the data structure is a `dict` in terms of OOP as `defaultdict` is a subclass of `dict`. Check `isinstance(result, dict)`. – user2390182 Sep 14 '18 at 20:25

score 2 · Answer 3 · answered Sep 14 '18 at 15:11

I don't think zip() is right for this. Here are two potential solutions. If you have to use a list to store the results you will have to go a step further after this answer. However, if using a dict for the results would work, then this answer might get you there:

 stList = [('NJ', 'Burlington County'),
 ('NJ', 'Middlesex County'),
 ('VA', 'Frederick County'),
 ('MD', 'Montgomery County'),
 ('NC', 'Lee County'),
 ('NC', 'Alamance County')]


new_list = []
for item in stList:
    new_list.append({item[0]:item[1]})

print "new list: ", new_list


new_dict = {}
for item in stList:
    if item[0] in new_dict:
        new_dict[item[0]].append(item[1])
    else:
        new_dict[item[0]] = [item[1]]

print "new dict: ", new_dict

These solutions yield the following:

new list: [{'NJ': 'Burlington County'}, {'NJ': 'Middlesex County'}, {'VA': 'Frederick County'}, {'MD': 'Montgomery County'}, {'NC': 'Lee County'}, {'NC': 'Alamance County'}]

new dict: {'VA': ['Frederick County'], 'NJ': ['Burlington County', 'Middlesex County'], 'NC': ['Lee County', 'Alamance County'], 'MD': ['Montgomery County']}

score 2 · Answer 4 · answered Sep 14 '18 at 15:31

list comprehension seems to be the easiest way here

[{i[0]:i[1]} for i in stList]

OUTPUT

[{'NJ': 'Burlington County'},
{'NJ': 'Middlesex County'},
{'VA': 'Frederick County'},
{'MD': 'Montgomery County'},
{'NC': 'Lee County'},
{'NC': 'Alamance County'}]

Woody1193 · Answer 5 · 2018-09-14T15:01:05.883

The reason your code is broken is probably due to a misunderstand with zip. It's basically treating each name as a separate iterator and iterating over the first two characters s[:1]. If you want a mapping between states and counties in each state, you could try this:

mapping = {}
for state, cty in stList:
    if (state in mapping):
        mapping[state].append(cty)
    else:
        mapping[state] = [cty]

That's the simplest way to do it, at any rate. However, if you want to use itertools, you could do a groupby like this:

mapping = dict( [ (k, [gg[1] for gg in g]) for k, g in groupby(stList, key = lambda x: x[0]) ] )

Handling for redundancy in a list

5 Answers5