3

I have an array containing an even number of integers. The array represents a pairing of an identifier and a count. The tuples have already been sorted by the identifier. I would like to merge a few of these arrays together. I have thought of a few ways to do it but they are fairly complicated and I feel there might be an easy way to do this with python.

IE:

[<id>, <count>, <id>, <count>]

Input:

[14, 1, 16, 4, 153, 21]
[14, 2, 16, 3, 18, 9]

Output:

[14, 3, 16, 7, 18, 9, 153, 21]
charliehorse55
  • 1,940
  • 5
  • 24
  • 38

4 Answers4

8

It would be better to store these as dictionaries than as lists (not just for this purpose, but for other use cases, such as extracting the value of a single ID):

x1 = [14, 1, 16, 4, 153, 21]
x2 = [14, 2, 16, 3, 18, 9]

# turn into dictionaries (could write a function to convert)
d1 = dict([(x1[i], x1[i + 1]) for i in range(0, len(x1), 2)])
d2 = dict([(x2[i], x2[i + 1]) for i in range(0, len(x2), 2)])

print d1
# {16: 4, 153: 21, 14: 1}

After that, you could use any of the solutions in this question to add them together. For example (taken from the first answer):

import collections

def d_sum(a, b):
    d = collections.defaultdict(int, a)
    for k, v in b.items():
        d[k] += v
    return dict(d)

print d_sum(d1, d2)
# {16: 7, 153: 21, 18: 9, 14: 3}
Community
  • 1
  • 1
David Robinson
  • 77,383
  • 16
  • 167
  • 187
5

Use collections.Counter:

import itertools
import collections

def grouper(n, iterable, fillvalue=None):
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

count1 = collections.Counter(dict(grouper(2, lst1)))
count2 = collections.Counter(dict(grouper(2, lst2)))
result = count1 + count2

I've used the itertools library grouper recipe here to convert your data to dictionaries, but as other answers have shown you there are more ways to skin that particular cat.

result is a Counter with each id pointing to a total count:

Counter({153: 21, 18: 9, 16: 7, 14: 3})

Counters are multi-sets and will keep track of the count of each key with ease. It feels like a much better data structure for your data. They support summing, as used above, for example.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
5

collections.Counter() is what you need here:

In [21]: lis1=[14, 1, 16, 4, 153, 21]

In [22]: lis2=[14, 2, 16, 3, 18, 9]

In [23]: from collections import Counter

In [24]: dic1=Counter(dict(zip(lis1[0::2],lis1[1::2])))

In [25]: dic2=Counter(dict(zip(lis2[0::2],lis2[1::2])))

In [26]: dic1+dic2
Out[26]: Counter({153: 21, 18: 9, 16: 7, 14: 3})

or :

In [51]: it1=iter(lis1)

In [52]: it2=iter(lis2)

In [53]: dic1=Counter(dict((next(it1),next(it1)) for _ in xrange(len(lis1)/2))) 
In [54]: dic2=Counter(dict((next(it2),next(it2)) for _ in xrange(len(lis2)/2))) 
In [55]: dic1+dic2
Out[55]: Counter({153: 21, 18: 9, 16: 7, 14: 3})
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
0

All of the previous answers look good, but I think that the JSON blob should be properly formed to begin with or else (from my experience) it can cause some serious problems down the road during debugging etc. In this case with id and count as the fields, the JSON should look like

[{"id":1, "count":10}, {"id":2, "count":10}, {"id":1, "count":5}, ...]

Properly formed JSON like that is much easier to deal with, and probably similar to what you have coming in anyway.

This class is a bit general, but certainly extensible


from itertools import groupby
class ListOfDicts():
    def init_(self, listofD=None):
        self.list = []
        if listofD is not None:
            self.list = listofD

    def key_total(self, group_by_key, aggregate_key):
        """ Aggregate a list of dicts by a specific key, and aggregation key"""
        out_dict = {}
        for k, g in groupby(self.list, key=lambda r: r[group_by_key]):
            print k
            total=0
            for record in g:
                print "   ", record
                total += record[aggregate_key]
            out_dict[k] = total
        return out_dict


if __name__ == "__main__":
    z = ListOfDicts([ {'id':1, 'count':2, 'junk':2}, 
                   {'id':1, 'count':4, 'junk':2},
                   {'id':1, 'count':6, 'junk':2},
                   {'id':2, 'count':2, 'junk':2}, 
                   {'id':2, 'count':3, 'junk':2},
                   {'id':2, 'count':3, 'junk':2},
                   {'id':3, 'count':10, 'junk':2},
                   ])

    totals = z.key_total("id", "count")
    print totals

Which gives


1
    {'count': 2, 'junk': 2, 'id': 1}
    {'count': 4, 'junk': 2, 'id': 1}
    {'count': 6, 'junk': 2, 'id': 1}
2
    {'count': 2, 'junk': 2, 'id': 2}
    {'count': 3, 'junk': 2, 'id': 2}
    {'count': 3, 'junk': 2, 'id': 2}
3
    {'count': 10, 'junk': 2, 'id': 3}

{1: 12, 2: 8, 3: 10}

reptilicus
  • 10,290
  • 6
  • 55
  • 79
  • I have 50k elements. I am avoiding JSON in that style because it would increase the size of message by a factor of 10. – charliehorse55 Jan 03 '13 at 20:03
  • Yeah, that might be taxing on the server. Depends on how frequent its getting hit. From my experience though, nicely formed JSON makes life better... – reptilicus Jan 03 '13 at 20:16