-5

how to merge duplicate dictionaries into 1 dictionaries for same type.

Input Data:

[{'Email Address': 'abc@xyz.com', 'Email Address Type (Primary/Alternate)': 'Primary Email'}, {'Email Address': 'abc@xyz.com', 'Email Address Type (Primary/Alternate)': 'Primary Email'}, {'Email Address': 'abc@xyz.com', 'Email Address Type (Primary/Alternate)': 'Primary Email'}]

Output Data:

[{'Email Address': ['abc@xyz.com','abc@xyz.com','abc@xyz.com'], 'Email Address Type (Primary/Alternate)': 'Primary Email'}]

3 Answers3

1

EDIT: Based on the exact output you need, try this -

  1. You can use collections.defaultdict for this purpose.
  2. You have to explicitly mentioned which keys you what to aggregate as a list. That is what the listKeys variable is for below.
  3. This iterates over the list of dicts and then the dict items itself, and if the key is in listkeys it appends it as a list, otherwise it simply updates with the latest value.
l = [{'Email Address': 'abc@xyz.com', 'Email Address Type (Primary/Alternate)': 'Primary Email'}, {'Email Address': 'abc@xyz.com', 'Email Address Type (Primary/Alternate)': 'Primary Email'}, {'Email Address': 'abc@xyz.com', 'Email Address Type (Primary/Alternate)': 'Primary Email'}]
listKeys = ['Email Address'] #keys where you want output to be a list

d = defaultdict(list)
for i in l:
    for k,v in i.items():
        if k in listKeys:
            d[k].append(v)
        else:
            d[k]=v
output = dict(d)
output
{'Email Address': ['abc@xyz.com', 'abc@xyz.com', 'abc@xyz.com'],
 'Email Address Type (Primary/Alternate)': 'Primary Email'}

1. Repeated items in list

You can use collections.defaultdict for this purpose

from collections import defaultdict

d = defaultdict(list)

for i in l:
    for k,v in i.items():
        d[k].append(v)

output = dict(d)
{'Email Address': ['abc@xyz.com', 'abc@xyz.com', 'abc@xyz.com'],
 'Email Address Type (Primary/Alternate)': ['Primary Email',
  'Primary Email',
  'Primary Email']}

2. Unique items in list

If you only want unique items -

d = defaultdict(list)

for i in l:
    for k,v in i.items():
        if v not in d[k]:
            d[k].append(v)

output = dict(d)
output
{'Email Address': ['abc@xyz.com'],
 'Email Address Type (Primary/Alternate)': ['Primary Email']}
Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51
  • Please read the comments by OP. He clearly mentions that he needs repeated values for "primary email" as well as a unique version. Infact its your question itself which he has answered. I dont see the need for downvote. – Akshay Sehgal Jan 07 '22 at 07:00
  • Primary Email should not be in list form. – Naveen Gupta Jan 07 '22 at 07:02
  • Primary Email should be unique and not in the list. – Naveen Gupta Jan 07 '22 at 07:05
  • Updated answer is more like hard coded. – Naveen Gupta Jan 07 '22 at 07:06
  • How is it hardcoded? How are you deciding which key to keep unique and which to keep as a list?? – Akshay Sehgal Jan 07 '22 at 07:08
  • @BrokenBenchmark, thanks, however I am a bit confused by OPs new comments on how this is "hardcoded", waiting on OP to clarify. – Akshay Sehgal Jan 07 '22 at 07:09
  • @Akshay Sehgal, Can't keep the key in code. Email address values should be in list. – Naveen Gupta Jan 07 '22 at 07:09
  • The email addresses are in a list ... (?) – BrokenBenchmark Jan 07 '22 at 07:11
  • @NaveenGupta, can you also clarify my question. How would you decided, lets say out of 10 keys.. which ones to keep as list and which ones as unique – Akshay Sehgal Jan 07 '22 at 07:11
  • @NaveenGupta, in order to clarify that you WILL need to explicitly inform which key has to be appended as a list, and which one has to be just updated with latest value. So this code is NOT hardcoded, its exactly what you can do with your current explanation of the problem. – Akshay Sehgal Jan 07 '22 at 07:12
  • Thanks @Akshay , I am sorry but not really sure how to decide which key should go to list. I am checking on tha. – Naveen Gupta Jan 07 '22 at 07:13
  • 2
    @NaveenGupta In the future, please post a precise, finalized English specification of what you want the output schema to look like. Akshay (as well as the other two answerers) clearly want to help, but none of us can do anything unless we have a clear understanding of what you're asking. – BrokenBenchmark Jan 07 '22 at 07:15
  • 1
    @AkshaySehgal, excellent answer. Can you tell how do you get the value `v` in the first code block? (e.g. `d['Email Address Type (Primary/Alternate)'] = v`) – arshovon Jan 07 '22 at 07:16
  • @arsho, thanks a ton for spotting that. fixed it. It was a bug from my previously ran scripts that are mentioned in the second half of my answer. – Akshay Sehgal Jan 07 '22 at 07:17
  • @NaveenGupta I have updated my answer further to remove the explicit key naming from the for loops. However, you still have to mention which keys you want appended as a list in `listKeys` – Akshay Sehgal Jan 07 '22 at 07:27
0

This code snippet produces the expected output using a list comprehension.

email_addresses = [entry['Email Address'] for entry in data]

print({
    'Email Address': email_addresses,
    'Email Address Type (Primary/Alternate)': data[0]['Email Address Type (Primary/Alternate)']
})
BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33
0

A long code using Python's dictionary:

data = [{'Email Address': 'abc@xyz.com',
         'Email Address Type (Primary/Alternate)': 'Primary Email'},
        {'Email Address': 'abc@xyz.com',
         'Email Address Type (Primary/Alternate)': 'Primary Email'},
        {'Email Address': 'abc@xyz.com',
         'Email Address Type (Primary/Alternate)': 'Primary Email'}]
results = []
for row in data:
    current_type = row['Email Address Type (Primary/Alternate)']
    found_existing_record = False
    for record in results:
        if record['Email Address Type (Primary/Alternate)'] == current_type:
            record['Email Address'].append(row['Email Address'])
            found_existing_record = True
    if not found_existing_record:
        results.append({
            'Email Address': [row['Email Address']],
            'Email Address Type (Primary/Alternate)': current_type
        })
print(results)

Output:

[{'Email Address': ['abc@xyz.com', 'abc@xyz.com', 'abc@xyz.com'], 'Email Address Type (Primary/Alternate)': 'Primary Email'}]

Explanation of this approach:

Here I have searched for the type of Email address, not hardcoded value. If it is not in the results list it will add a dictionary object with the email address. If the type is already in the results list then it will add the email address to the relevant list.

The above code will work with data like below:

[{'Email Address': 'abc@xyz.com',
'Email Address Type (Primary/Alternate)': 'Primary Email'},
{'Email Address': 'abc@xyz.com',
'Email Address Type (Primary/Alternate)': 'Primary Email'},
{'Email Address': 'qqq@xyz.com',
'Email Address Type (Primary/Alternate)': 'Alternate Email'}]

Disclaimer:

Better answers are already posted.

arshovon
  • 13,270
  • 9
  • 51
  • 69