6

I'm not sure if I am asking the question in the right way, but this is my issue:

I have a list of dicts in the following format:

[
{'user': 'joe', 'IndexUsed': 'a'}, 
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'b'}, 
{'user': 'admin', 'IndexUsed': 'a'}, 
{'user': 'admin', 'IndexUsed': 'c'},
{'user': 'hugo', 'IndexUsed': 'a'},
{'user': 'hugo', 'IndexUsed': 'd'},
...
]

I want my final result to look like this:

[
{'user': 'joe', 'IndexUsed': ['a', 'b']}, 
{'user': 'admin', 'IndexUsed': ['a', 'c']}, 
{'user': 'hugo', 'IndexUsed': ['a', 'd']},
]

In essence, combining/deduplicating the unique fields in IndexUsed and reducing them to only one dict per user

I have looked into using reducers, dict comprehension, and searched on StackOverflow but I have some trouble finding use cases using strings. The majority of examples I have found are using integers to combine them into a final int/float, but here I rather want to combine it into a single final string. Could you help me understand how to approach this problem?

  • So you want the output to remain as a list of dicts, but with the IndexUsed key being consolidated in to a list? – itprorh66 Jan 12 '21 at 14:47

4 Answers4

4
from collections import defaultdict


data = [{'IndexUsed': 'a', 'user': 'joe'},
 {'IndexUsed': 'a', 'user': 'joe'},
 {'IndexUsed': 'a', 'user': 'joe'},
 {'IndexUsed': 'b', 'user': 'joe'},
 {'IndexUsed': 'a', 'user': 'admin'},
 {'IndexUsed': 'c', 'user': 'admin'},
 {'IndexUsed': 'a', 'user': 'hugo'},
 {'IndexUsed': 'd', 'user': 'hugo'}]

indexes_used = defaultdict(set)
for d in data:
    indexes_used[d['user']].add(d['IndexUsed'])

result = []
for k, v in indexes_used.items():
    result.append({'user': k, 'IndexUsed': sorted(list(v))})

print(*result)

Outputs:

{'user': 'joe', 'IndexUsed': ['a', 'b']} {'user': 'admin', 'IndexUsed': ['a', 'c']} {'user': 'hugo', 'IndexUsed': ['a', 'd']}

Note: for the unaware, defaultdict uses the passed function (set in this case) as a factory to create the new missing key corresponding value. So every single key of indexes_used is set to a set filled with the used indexes. Using a set also ignores duplicates. In the end the set is converted to a sorted list, while creating the required key IndexUsed.

progmatico
  • 4,714
  • 1
  • 16
  • 27
1

If the dictionaries are guaranteed to be grouped together by name, then you could use itertools.groupby to process each group of dictionaries separately:

from itertools import groupby
from operator import itemgetter

data = [
    {'user': 'joe', 'IndexUsed': 'a'},
    {'user': 'joe', 'IndexUsed': 'a'},
    {'user': 'joe', 'IndexUsed': 'a'},
    {'user': 'joe', 'IndexUsed': 'b'},
    {'user': 'admin', 'IndexUsed': 'a'},
    {'user': 'admin', 'IndexUsed': 'c'},
    {'user': 'hugo', 'IndexUsed': 'a'},
    {'user': 'hugo', 'IndexUsed': 'd'},
]

merged_data = [{"user": key, "IndexUsed": list({i: None for i in map(itemgetter("IndexUsed"), group)})} for key, group in groupby(data, key=itemgetter("user"))]
for d in merged_data:
    print(d)

Output:

{'user': 'joe', 'IndexUsed': ['a', 'b']}
{'user': 'admin', 'IndexUsed': ['a', 'c']}
{'user': 'hugo', 'IndexUsed': ['a', 'd']}
>>> 

This was just the first thing I came up with, but I don't like it for several reasons. First, like I said, it assumes that the original dictionaries are grouped together by the key user. In addition, long list-comprehensions are not readable and should be avoided. The way in which the merged IndexUsed list is generated is by creating a temporary dictionary which maps unique entries to None (ew, gross - a dictionary is used rather than a set, because sets don't preserve insertion order). It also assumes you're using a certain version of Python 3.x+, where dictionaries are guaranteed to preserve insertion order (you could be more explicit by using collections.OrderedDict, but that's one more import). Finally, you shouldn't have to hardcode the "user" and "IndexUsed" key-literals. Someone please suggest a better answer.

Paul M.
  • 10,481
  • 2
  • 9
  • 15
1

One way to approach this requirement without making use of any libs if you are interested:

arr = [
{'user': 'joe', 'IndexUsed': 'a'}, 
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'b'}, 
{'user': 'admin', 'IndexUsed': 'a'}, 
{'user': 'admin', 'IndexUsed': 'c'},
{'user': 'hugo', 'IndexUsed': 'a'},
{'user': 'hugo', 'IndexUsed': 'd'},
]

global_dict = {}


            
for d in arr:


     if(False if d["user"] in global_dict else True):

            global_dict[d["user"]] = [d["IndexUsed"]]
     else:
            global_dict[d["user"]].append(d["IndexUsed"])
            global_dict[d["user"]] = list(set(global_dict[d["user"]]))
 

print(global_dict)

# Now we get a dict of dicts with key as user and value as an array of distinct IndexUsed values: 
# {
#  'joe': ['b', 'a'],
#  'admin': ['c', 'a'],
#  'hugo': ['d', 'a']
# }



final_list = []

for k,v in global_dict.items():
    final_list.append({"user":k,"IndexUsed":v})


print(final_list)

#Desired Output
# [
#  {'user': 'joe', 'IndexUsed': ['b', 'a']},
#  {'user': 'admin', 'IndexUsed': ['c', 'a']},
#  {'user': 'hugo', 'IndexUsed': ['d', 'a']}
# ]

However, if you are a fan of short-liners... let me minimize @progmatico's awesome defaultdict approach to just these three lines.

from collections import defaultdict


indexes_used = defaultdict(set)
[indexes_used[d['user']].add(d['IndexUsed']) for d in data] # for the side effect
print([{'user': k, 'IndexUsed': sorted(list(v))} for k, v in indexes_used.items()])

And it's still readable.

Aditya Patnaik
  • 1,490
  • 17
  • 27
  • Thanks, just edited a little bit. I think *working and readable* is good enough. In Python you should never assume you made it in the *most pythonic way*. There may be yet another idiomatic way. Although elegance usually shows for itself, I see too much people worried about being *pythonic*. – progmatico Jan 12 '21 at 18:58
  • @progmatico, I fully agree with you. – Aditya Patnaik Jan 13 '21 at 01:19
0

without any external lib:

l = [
    {'user': 'joe', 'IndexUsed': 'a'}, 
    {'user': 'joe', 'IndexUsed': 'a'},
    {'user': 'joe', 'IndexUsed': 'a'},
    {'user': 'joe', 'IndexUsed': 'b'}, 
    {'user': 'admin', 'IndexUsed': 'a'}, 
    {'user': 'admin', 'IndexUsed': 'c'},
    {'user': 'hugo', 'IndexUsed': 'a'},
    {'user': 'hugo', 'IndexUsed': 'd'}
]

def combinator(l):
    d = {}
        
    for item in l:
        if(d.get(item['user']) == None):
            d[item['user']] = {item['IndexUsed']}
            pass
        d[item['user']].add(item['IndexUsed'])
        
    return [{'user': key, 'IndexUsed': sorted(value)} for key, value in d.items()]


print(combinator(l))
Andrea Ciccotta
  • 598
  • 6
  • 16