Comparing 2 list of dictionaries and returning changed values as nested dicts

Question

i have 2 list of dicts as follows

last_data = [{'country': 'USA', 'cases': 10425, 'deaths': 1704, 'recovered': 2525, 'active': 100027},
            {'country': 'Australia', 'cases': 3045, 'deaths': 1704, 'recovered': 2525, 'active': 100027},
            {'country': 'Germany', 'cases': 6025, 'deaths': 1704, 'recovered': 2525, 'active': 100027}]

current_data = [{'country': 'USA', 'cases': 10425, 'deaths': 1704, 'recovered': 2525, 'active': 100027},
                {'country': 'Australia', 'cases': 3046, 'deaths': 1704, 'recovered': 2525, 'active': 100028},
                {'country': 'Germany', 'cases': 6026, 'deaths': 1706, 'recovered': 2525, 'active': 100026}]

i am trying to achieve below by comparing these 2

expected_output = [{'Australia':{'last_cases_value': 3045,'updated_cases_value':3046,'change':1,
                                {'last_active_value':100027,'updated_active_value':10028,'change':1}
                   {'Germany':{'last_cases_value': 6025,'updated_cases_value':6026,'change':1},
                              {'last_death_value':1704,'updated_death_value':1706,'change':2},
                              {'last_active_value':100027,'updated_active_value':10026,'change':-1}]

I have tried many things but the one which got me sort of close is

pairs = zip(last_data, current_data)
print([(x, y) for x, y in pairs if x != y])

output of above code is as follows

[({'country': 'Australia', 'cases': 3045, 'deaths': 1704, 'recovered': 2525, 'active': 100027}, {'country': 'Australia', 'cases': 3046, 'deaths': 1704, 'recovered': 2525, 'active': 100028}), ({'country': 'Germany', 'cases': 6025, 'deaths': 1704, 'recovered': 2525, 'active': 100027}, {'country': 'Germany', 'cases': 6026, 'deaths': 1704, 'recovered': 2525, 'active': 100027})]

Still cant figure out how to convert to my expected output

I am using latest version python3.8

Any help will be greatly appreciated

Does this answer your question? [Comparing Python dictionaries and nested dictionaries](https://stackoverflow.com/questions/27265939/comparing-python-dictionaries-and-nested-dictionaries) — dspencer, Mar 28 '20 at 13:46
''USA" should also be in the expected output? if no base on what do you filter ? — kederrac, Mar 28 '20 at 15:35

RoadRunner · Accepted Answer · 2020-03-28T21:28:10.237

Your expected output is not valid. I would probably go for a nested dictionary like this instead:

{
    "Germany": {
        "cases": {
            "last_cases_value": 6025,
            "updated_cases_value": 6026,
            "change": 1
        },
        "active": {
            "last_active_value": 100027,
            "updated_active_value": 100026,
            "change": -1
        },
        "deaths": {
            "last_deaths_value": 1704,
            "updated_deaths_value": 1706,
            "change": 2
        }
    },
    "Australia": {
        "cases": {
            "last_cases_value": 3045,
            "updated_cases_value": 3046,
            "change": 1
        },
        "active": {
            "last_active_value": 100027,
            "updated_active_value": 100028,
            "change": 1
        }
    }
}

To get the above, I would first convert your list of dictionaries into nested dictionaries where 'country' is the key:

last_data = [{'country': 'USA', 'cases': 10425, 'deaths': 1704, 'recovered': 2525, 'active': 100027},
             {'country': 'Australia', 'cases': 3045, 'deaths': 1704, 'recovered': 2525, 'active': 100027},
             {'country': 'Germany', 'cases': 6025, 'deaths': 1704, 'recovered': 2525, 'active': 100027}]

current_data = [{'country': 'USA', 'cases': 10425, 'deaths': 1704, 'recovered': 2525, 'active': 100027},
                {'country': 'Australia', 'cases': 3046, 'deaths': 1704, 'recovered': 2525, 'active': 100028},
                {'country': 'Germany', 'cases': 6026, 'deaths': 1706, 'recovered': 2525, 'active': 100026}]

def list_dicts_to_nested_dict(key, lst):
    return {dic[key]: {k: v for k, v in dic.items() if k != key} for dic in lst}

last_data_dict = list_dicts_to_nested_dict('country', last_data)
# {'USA': {'cases': 10425, 'deaths': 1704, 'recovered': 2525, 'active': 100027}, 'Australia': {'cases': 3045, 'deaths': 1704, 'recovered': 2525, 'active': 100027}, 'Germany': {'cases': 6025, 'deaths': 1704, 'recovered': 2525, 'active': 100027}}

current_data_dict = list_dicts_to_nested_dict('country', current_data)
# {'USA': {'cases': 10425, 'deaths': 1704, 'recovered': 2525, 'active': 100027}, 'Australia': {'cases': 3046, 'deaths': 1704, 'recovered': 2525, 'active': 100028}, 'Germany': {'cases': 6026, 'deaths': 1706, 'recovered': 2525, 'active': 100026}}

Which is also a good idea because searching for a specific countries data will be O(1) instead of O(N) from scanning the whole dictionary. It also makes it easier to intersect the countries in the future, which I will show below.

Then add the changed data to a nested two-depth collections.defaultdict of dict, since it handles initializing new keys for you. You can have a look at this Nested defaultdict of defaultdict answer for more information and other ways of doing this.

result = defaultdict(lambda: defaultdict(dict))

# Get the intersecting keys.
# Avoids Key Errors in the future, if both dictionaries don't have the same key
for country in last_data_dict.keys() & current_data_dict.keys():

    # Only deal with dictionaries that have changed
    if last_data_dict[country] != current_data_dict[country]:

        # Get intersecting keys between both dictionaries
        for key in last_data_dict[country].keys() & current_data_dict[country].keys():

            # Calculate the change between updated and previous data
            change = current_data_dict[country][key] - last_data_dict[country][key]

            # We only care about data that has changed
            # Insert data into dictionary
            if change != 0:
                result[country][key][f"last_{key}_value"] = last_data_dict[country][key]
                result[country][key][f"updated_{key}_value"] = current_data_dict[country][key]
                result[country][key]["change"] = change

Then you can serialize and output the above data as a JSON formatted string with json.dumps, since its easier to output a nested defaultdict this way instead of converting the whole data structure to dict recursively or some other method. defaultdict is a subclass of dict anyways, so it can be treated like a normal dictionary.

print(dumps(result, indent=4))

Additionally, if you don't care about the output, then printing the defaultdict directly is an easy option as well:

print(result)
# defaultdict(<function <lambda> at 0x000002355BC3AA60>, {'Australia': defaultdict(<class 'dict'>, {'cases': {'last_cases_value': 3045, 'updated_cases_value': 3046, 'change': 1}, 'active': {'last_active_value': 100027, 'updated_active_value': 100028, 'change': 1}}), 'Germany': defaultdict(<class 'dict'>, {'deaths': {'last_deaths_value': 1704, 'updated_deaths_value': 1706, 'change': 2}, 'cases': {'last_cases_value': 6025, 'updated_cases_value': 6026, 'change': 1}, 'active': {'last_active_value': 100027, 'updated_active_value': 100026, 'change': -1}})})

As an extra optional but not needed step, as highlighted above, we could create a recursive function to convert the nested defaultdict to a normal dictionary with sub levels of type dict:

def defaultdict_to_dict(df):
    result = {}

    for k, v in df.items():
        if isinstance(v, defaultdict):
            result[k] = dict(v)
            defaultdict_to_dict(v)

    return dict(result)

pprint(defaultdict_to_dict(result))

Which works as intended:

{'Australia': {'active': {'change': 1,
                          'last_active_value': 100027,
                          'updated_active_value': 100028},
               'cases': {'change': 1,
                         'last_cases_value': 3045,
                         'updated_cases_value': 3046}},
 'Germany': {'active': {'change': -1,
                        'last_active_value': 100027,
                        'updated_active_value': 100026},
             'cases': {'change': 1,
                       'last_cases_value': 6025,
                       'updated_cases_value': 6026},
             'deaths': {'change': 2,
                        'last_deaths_value': 1704,
                        'updated_deaths_value': 1706}}}

You can have a look at the full implementation on ideone.com.

@RoadRunner Thank you so much for detailed explanation . I have a question Why is expected outcome not a valid dictionary ? Am i missing a , or something ? — kkoc3, Mar 28 '20 at 20:48
@kkoc3 Your expected output is not valid because it is missing a `}` from `{'Australia':{'last_cases_value': 3045,'updated_cases_value':3046,'change':1,`. If I try to run your expected output in a python shell I get `SyntaxError: invalid syntax`. I showed a much easier way to structure your data anyways, so I'm glad it helped :). — RoadRunner, Mar 28 '20 at 21:18

kederrac · Answer 2 · 2020-03-28T16:34:16.097

you can use list and dictionary comprehension:

l = {l['country']: {'v': l['cases'], 'a': l['active']} for l in last_data}
c = {l['country']: {'v': l['cases'], 'a': l['active']} for l in current_data}

result = [{k: [{'last_cases_value': l[k]['v'], 
               'updated_cases_value': c[k]['v'],
               'change': c[k]['v'] - l[k]['v']},
               {'last_active_value': l[k]['a'], 
               'updated_active_value': c[k]['a'],
               'change': c[k]['a'] - l[k]['a']}]} for k in c.keys()]

output:

[{'USA': [{'last_cases_value': 10425,
    'updated_cases_value': 10425,
    'change': 0},
   {'last_active_value': 100027,
    'updated_active_value': 100027,
    'change': 0}]},
 {'Australia': [{'last_cases_value': 3045,
    'updated_cases_value': 3046,
    'change': 1},
   {'last_active_value': 100027,
    'updated_active_value': 100028,
    'change': 1}]},
 {'Germany': [{'last_cases_value': 6025,
    'updated_cases_value': 6026,
    'change': 1},
   {'last_active_value': 100027,
    'updated_active_value': 100026,
    'change': -1}]}]

if you want to keep in the result only those countries that have changed their stats:

result = [{k: [{'last_cases_value': l[k]['v'], 
               'updated_cases_value': c[k]['v'],
               'change': c[k]['v'] - l[k]['v']},
               {'last_active_value': l[k]['a'], 
               'updated_active_value': c[k]['a'],
               'change': c[k]['a'] - l[k]['a']}]}
          for k in c.keys() if  c[k]['a'] - l[k]['a'] and c[k]['v'] - l[k]['v']]

output:

[{'Australia': [{'last_cases_value': 3045,
    'updated_cases_value': 3046,
    'change': 1},
   {'last_active_value': 100027,
    'updated_active_value': 100028,
    'change': 1}]},
 {'Germany': [{'last_cases_value': 6025,
    'updated_cases_value': 6026,
    'change': 1},
   {'last_active_value': 100027,
    'updated_active_value': 100026,
    'change': -1}]}]

I'm assuming because `USA` has no changes, so it should be not included in the output. I ended up just suggesting an alternative output the OP should go for. — RoadRunner, Mar 28 '20 at 16:17
thanks @kederrac for helping me with this. i really appreciate it — kkoc3, Mar 28 '20 at 20:58

Grzegorz Skibinski · Answer 3 · 2020-03-28T15:10:26.407

you can use pandas:

import pandas as pd

curr_df=pd.DataFrame(current_data)
last_df=pd.DataFrame(last_data)

df=curr_df.merge(last_df, on="country", suffixes=["_updated", "_last"])
res=df[sorted(df.columns)].rename(columns={c: "_".join(c.split("_")[::-1])+"_value" for c in df.columns if c!="country"}).set_index('country').to_dict('index')

res=[dict([(k, [dict(p) for p in list(zip(v.items(), list(v.items())[1:]))[::2]])]) for k, v in res.items()]

Output:

[{'USA': [{'last_active_value': 100027, 'updated_active_value': 100027}, {'last_cases_value': 10425, 'updated_cases_value': 10425}, {'last_deaths_value': 1704, 'updated_deaths_value': 1704}, {'last_recovered_value': 2525, 'updated_recovered_value': 2525}]}, {'Australia': [{'last_active_value': 100027, 'updated_active_value': 100028}, {'last_cases_value': 3045, 'updated_cases_value': 3046}, {'last_deaths_value': 1704, 'updated_deaths_value': 1704}, {'last_recovered_value': 2525, 'updated_recovered_value': 2525}]}, {'Germany': [{'last_active_value': 100027, 'updated_active_value': 100026}, {'last_cases_value': 6025, 'updated_cases_value': 6026}, {'last_deaths_value': 1704, 'updated_deaths_value': 1706}, {'last_recovered_value': 2525, 'updated_recovered_value': 2525}]}]

Note - merge on default does inner join, if you want full outer/left outer have a look at its how argument:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

zip formula - to reshape list pairwise - was taken from here: https://stackoverflow.com/a/5394908/11610186

Yup, I got a bit confused by the ```expected_output``` look - now it matches — Grzegorz Skibinski, Mar 28 '20 at 15:11

Comparing 2 list of dictionaries and returning changed values as nested dicts

3 Answers3