3

I have 2 lists that share information. First, I want to have a unique set of names (e.g.list_person has repeated name values); For this I produce a new list of dictionaries. Then, I want to add/append list_pets['pet'] to the correct list_person['pets'] in the new dictionary with unique name values, when the list_pets['person_id'] matches the list_person['id'].

For clarification here is my code + desired output:

My current code:

list_person = [{'id': 12345, 'name': 'Bobby Bobs', 'pets': ['cat']}, # you see that name values are repeated
              {'id': 678910, 'name': 'Bobby Bobs', 'pets': ['zebra']},
              {'id': 111213, 'name': 'Lisa Bobs', 'pets': ['horse']},
              {'id': 141516, 'name': 'Lisa Bobs', 'pets': ['rabbit']}]

list_pets = [{'id': 'abcd', 'pet': 'shark', 'person_id': 12345}, #Bobby Bobs' pets
             {'id': 'efgh', 'pet': 'tiger', 'person_id': 678910}, #Bobby Bobs' pets
             {'id': 'ijkl', 'pet': 'elephant', 'person_id': 111213}, #Lisa Bobs' pets
             {'id': 'mnopq', 'pet': 'dog', 'person_id': 141516}] #Lisa Bobs' pets

output = []
for person, pet in zip(list_person, list_pets):
    t = [temp_dict['name'] for temp_dict in output]
    if person['name'] not in t:
        output.append(person)    # make a new list of dicts with unique name values
        for unique_person in output: # if they share ID, add the missing pets. 
            if person['id'] == pet['person_id']:
                unique_person['pets'].append(pet['pet'])
print(output)

Desired output:

desired_out = [{'id': 12345, 'name': 'Bobby Bobs', 'pets': ['cat', 'zebra', 'shark', 'tiger']},
                {'id': 111213, 'name': 'Lisa Bobs', 'pets': ['horse', 'rabbit', 'elephant', 'dog']}]

Current output:

[{'id': 12345, 'name': 'Bobby Bobs', 'pets': ['cat', 'shark', 'elephant']}, {'id': 111213, 'name': 'Lisa Bobs', 'pets': ['horse', 'elephant']}]

My current output is not displaying all the correct pets. Why is that; and what advice would one give to me to get closer to the solution?

blah
  • 674
  • 3
  • 17
  • What does your current code output? – M-Chen-3 Apr 06 '21 at 17:42
  • I made an edit to the question with my current output. All of the code above is reproducible btw. :) thanks – blah Apr 06 '21 at 17:44
  • Should id of output be the first occurrence for a name – Rajesh Apr 06 '21 at 17:52
  • 1
    This appears to be an [XY Problem](https://en.wikipedia.org/wiki/XY_problem). You've used `dict`s where you should use a data frame; your IDs are not unique. If you design this "properly", this is a simple data frame `merge` and a `groupby`. Is there some system requirement that results in what appears to be poor design decisions? – Prune Apr 06 '21 at 17:53
  • @Rajesh C it doesn't have to be. The important thing is to have the name once in the list of dicts, and all the pets related to that person. In other words, no info loss. – blah Apr 06 '21 at 17:53
  • @Prune, if you could provide with an example with what you've just described I would appreciate it. In the end I do need a list of dicts, as I am dealing with Pysolr and populating a Solr database. The data I provide here is ofc made up for the question. – blah Apr 06 '21 at 17:58
  • @Prune the id's ARE unique btw. In my real script I use UUID (btw). Some of the ID's link between different list of dictionaries as a way of linking information – blah Apr 06 '21 at 18:11
  • You label those as `person_id`, but you have multiple IDs for each person. Thus, they are not unique as labeled. – Prune Apr 06 '21 at 18:13
  • "Provide me an example" is out of scope for Stack Overflow, as such generic examples exist in almost any tutorial on PANDAS data frames. – Prune Apr 06 '21 at 18:14
  • Is there any guarantee of order connection between `list_person` and `list_pets`? Bobby has the first two records in both `list_person` and `list_pets`, or is that just a coincidence in the example? – aneroid Apr 06 '21 at 18:14

2 Answers2

1
import itertools
person_df = pd.DataFrame(list_person)
pets_df = pd.DataFrame(list_pets).drop(columns = ['id'])
joined_df = person_df.merge(pets_df, left_on = ['id'], right_on = ['person_id'])

Joined df:

>>> joined_df
       id        name               pets       pet  person_id
0   12345  Bobby Bobs       [cat, shark]     shark      12345
1  678910  Bobby Bobs     [zebra, tiger]     tiger     678910
2  111213   Lisa Bobs  [horse, elephant]  elephant     111213
3  141516   Lisa Bobs      [rabbit, dog]       dog     141516

Now first combine pets and pet columns then groupby on name

joined_df['pets'] = [pets + [pet] for pets, pet in zip(joined_df['pets'], joined_df['pet'])]
final_list = joined_df.groupby('name', as_index = False).agg(
                                  id = ('id', 'first'), 
                                  pets = ('pets', lambda x: list(itertools.chain(*x)))
                                ).to_dict('records')

Output:

>>> final_list
 [{'name': 'Bobby Bobs', 'id': 12345, 'pets': ['cat', 'shark', 'zebra', 'tiger']}, 
{'name': 'Lisa Bobs', 'id': 111213, 'pets': ['horse', 'elephant', 'rabbit', 'dog']}]
Amit Vikram Singh
  • 2,090
  • 10
  • 20
1

Here's a non-pandas solution, and it doesn't rely on an order-relation between list_person (aka 'people') and list_pets. So I'm not assuming that Bobby's data is the first two entries in both lists.

Initially, output will be a mapping on names to the person's data, incl pets. And ids will be maintained to link each person's different IDs - by intentionally using a reference to the data dict and not a copy.

Note that when a person is added to output, it is done as a deepcopy so that it doesn't affect the original item in list_person.

import copy

output = {}  # dict, not list
ids = {}  # needed to match with pets which has person_id

for person in list_person:
    if (name := person['name']) in output:
        output[name]['pets'].extend(person['pets'])
        output[name]['id'].append(person['id'])
        ids[person['id']] = output[name]  # itentionally a reference, not a copy
    else:
        output[name] = copy.deepcopy(person)  # so that the pet list is created as a copy
        output[name]['id'] = [output[person['name']]['id']]  # turn id's into a list
        ids[person['id']] = output[name]  # itentionally a reference, not a copy

for pet in list_pets:
    # the values in ids dict can be references to the same object
    # so use that to our advantage by directly appending to 'pet' list
    ids[pet['person_id']]['pets'].append(pet['pet'])

output is now:

{'Bobby Bobs': {'id': [12345, 678910],
                'name': 'Bobby Bobs',
                'pets': ['cat', 'zebra', 'shark', 'tiger']},
 'Lisa Bobs': {'id': [111213, 141516],
               'name': 'Lisa Bobs',
               'pets': ['horse', 'rabbit', 'elephant', 'dog']}
}

Final step to make it a list and only use one id for each person:

output = list(output.values())
for entry in output:
    entry['id'] = entry['id'][0]  # just the first id

Final output:

[{'id': 12345,
  'name': 'Bobby Bobs',
  'pets': ['cat', 'zebra', 'shark', 'tiger']},
 {'id': 111213,
  'name': 'Lisa Bobs',
  'pets': ['horse', 'rabbit', 'elephant', 'dog']}]

And if you don't mind multiple ids, skip the last step above and leave it at output = list(output.values()).

aneroid
  • 12,983
  • 3
  • 36
  • 66