Merge two arrays by collections of two elements

Question

I have an array containing an even number of integers. The array represents a pairing of an identifier and a count. The tuples have already been sorted by the identifier. I would like to merge a few of these arrays together. I have thought of a few ways to do it but they are fairly complicated and I feel there might be an easy way to do this with python.

IE:

[<id>, <count>, <id>, <count>]

Input:

[14, 1, 16, 4, 153, 21]
[14, 2, 16, 3, 18, 9]

Output:

[14, 3, 16, 7, 18, 9, 153, 21]

The tuples are not python tuples, just contiguos elements right? — imreal, Jan 03 '13 at 18:02
I'm new to python, if there is an easy way to convert this data to a dictionary I'm all for it. I'm getting the data from a JSON packet. There's a fair amount of data so I transferred it like this to save space. — charliehorse55, Jan 03 '13 at 18:03
Are the id's in a given array guaranteed to be unique? Or can they show up multiple times in a single array? How big are these arrays generally? — Travis Griggs, Jan 03 '13 at 18:05
ids are unique, I have to merge 10-20 arrays, each 5k to 50k elements long — charliehorse55, Jan 03 '13 at 18:06

score 8 · Accepted Answer · edited May 23 '17 at 12:18

It would be better to store these as dictionaries than as lists (not just for this purpose, but for other use cases, such as extracting the value of a single ID):

x1 = [14, 1, 16, 4, 153, 21]
x2 = [14, 2, 16, 3, 18, 9]

# turn into dictionaries (could write a function to convert)
d1 = dict([(x1[i], x1[i + 1]) for i in range(0, len(x1), 2)])
d2 = dict([(x2[i], x2[i + 1]) for i in range(0, len(x2), 2)])

print d1
# {16: 4, 153: 21, 14: 1}

After that, you could use any of the solutions in this question to add them together. For example (taken from the first answer):

import collections

def d_sum(a, b):
    d = collections.defaultdict(int, a)
    for k, v in b.items():
        d[k] += v
    return dict(d)

print d_sum(d1, d2)
# {16: 7, 153: 21, 18: 9, 14: 3}

Martijn Pieters · Answer 2 · 2013-01-03T18:20:37.710

Use collections.Counter:

import itertools
import collections

def grouper(n, iterable, fillvalue=None):
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

count1 = collections.Counter(dict(grouper(2, lst1)))
count2 = collections.Counter(dict(grouper(2, lst2)))
result = count1 + count2

I've used the itertools library grouper recipe here to convert your data to dictionaries, but as other answers have shown you there are more ways to skin that particular cat.

result is a Counter with each id pointing to a total count:

Counter({153: 21, 18: 9, 16: 7, 14: 3})

Counters are multi-sets and will keep track of the count of each key with ease. It feels like a much better data structure for your data. They support summing, as used above, for example.

Ashwini Chaudhary · Answer 3 · 2013-01-03T18:18:41.883

collections.Counter() is what you need here:

In [21]: lis1=[14, 1, 16, 4, 153, 21]

In [22]: lis2=[14, 2, 16, 3, 18, 9]

In [23]: from collections import Counter

In [24]: dic1=Counter(dict(zip(lis1[0::2],lis1[1::2])))

In [25]: dic2=Counter(dict(zip(lis2[0::2],lis2[1::2])))

In [26]: dic1+dic2
Out[26]: Counter({153: 21, 18: 9, 16: 7, 14: 3})

or :

In [51]: it1=iter(lis1)

In [52]: it2=iter(lis2)

In [53]: dic1=Counter(dict((next(it1),next(it1)) for _ in xrange(len(lis1)/2))) 
In [54]: dic2=Counter(dict((next(it2),next(it2)) for _ in xrange(len(lis2)/2))) 
In [55]: dic1+dic2
Out[55]: Counter({153: 21, 18: 9, 16: 7, 14: 3})

reptilicus · Answer 4 · 2013-01-03T20:11:20.190

All of the previous answers look good, but I think that the JSON blob should be properly formed to begin with or else (from my experience) it can cause some serious problems down the road during debugging etc. In this case with id and count as the fields, the JSON should look like

[{"id":1, "count":10}, {"id":2, "count":10}, {"id":1, "count":5}, ...]

Properly formed JSON like that is much easier to deal with, and probably similar to what you have coming in anyway.

This class is a bit general, but certainly extensible


from itertools import groupby
class ListOfDicts():
    def init_(self, listofD=None):
        self.list = []
        if listofD is not None:
            self.list = listofD

    def key_total(self, group_by_key, aggregate_key):
        """ Aggregate a list of dicts by a specific key, and aggregation key"""
        out_dict = {}
        for k, g in groupby(self.list, key=lambda r: r[group_by_key]):
            print k
            total=0
            for record in g:
                print "   ", record
                total += record[aggregate_key]
            out_dict[k] = total
        return out_dict


if __name__ == "__main__":
    z = ListOfDicts([ {'id':1, 'count':2, 'junk':2}, 
                   {'id':1, 'count':4, 'junk':2},
                   {'id':1, 'count':6, 'junk':2},
                   {'id':2, 'count':2, 'junk':2}, 
                   {'id':2, 'count':3, 'junk':2},
                   {'id':2, 'count':3, 'junk':2},
                   {'id':3, 'count':10, 'junk':2},
                   ])

    totals = z.key_total("id", "count")
    print totals

Which gives


1
    {'count': 2, 'junk': 2, 'id': 1}
    {'count': 4, 'junk': 2, 'id': 1}
    {'count': 6, 'junk': 2, 'id': 1}
2
    {'count': 2, 'junk': 2, 'id': 2}
    {'count': 3, 'junk': 2, 'id': 2}
    {'count': 3, 'junk': 2, 'id': 2}
3
    {'count': 10, 'junk': 2, 'id': 3}

{1: 12, 2: 8, 3: 10}

I have 50k elements. I am avoiding JSON in that style because it would increase the size of message by a factor of 10. — charliehorse55, Jan 03 '13 at 20:03
Yeah, that might be taxing on the server. Depends on how frequent its getting hit. From my experience though, nicely formed JSON makes life better... — reptilicus, Jan 03 '13 at 20:16

Merge two arrays by collections of two elements

4 Answers4