19

I've got a list of Tokens which looks something like:

[{
    Value: "Blah",
    StartOffset: 0,
    EndOffset: 4
}, ... ]

What I want to do is get a count of how many times each value occurs in the list of tokens.

In VB.Net I'd do something like...

Tokens = Tokens.
GroupBy(Function(x) x.Value).
Select(Function(g) New With {
           .Value = g.Key,
           .Count = g.Count})

What's the equivalent in Python?

Basic
  • 26,321
  • 24
  • 115
  • 201

5 Answers5

44

IIUC, you can use collections.Counter:

>>> from collections import Counter
>>> tokens = [{"Value": "Blah", "SO": 0}, {"Value": "zoom", "SO": 5}, {"Value": "Blah", "SO": 2}, {"Value": "Blah", "SO": 3}]
>>> Counter(tok['Value'] for tok in tokens)
Counter({'Blah': 3, 'zoom': 1})

if you only need a count. If you want them grouped by the value, you could use itertools.groupby and something like:

>>> from itertools import groupby
>>> def keyfn(x):
        return x['Value']
... 
>>> [(k, list(g)) for k,g in groupby(sorted(tokens, key=keyfn), keyfn)]
[('Blah', [{'SO': 0, 'Value': 'Blah'}, {'SO': 2, 'Value': 'Blah'}, {'SO': 3, 'Value': 'Blah'}]), ('zoom', [{'SO': 5, 'Value': 'zoom'}])]

although it's a little trickier because groupby requires the grouped terms to be contiguous, and so you have to sort by the key first.

DSM
  • 342,061
  • 65
  • 592
  • 494
7

Let's assume that is your python list, containing dictionnaries:

my_list = [{'Value': 'Blah',
            'StartOffset': 0,
            'EndOffset': 4},
           {'Value': 'oqwij',
            'StartOffset': 13,
            'EndOffset': 98},
           {'Value': 'Blah',
            'StartOffset': 6,
            'EndOffset': 18}]

A one liner:

len([i for i in a if i['Value'] == 'Blah']) # returns 2
Paco
  • 4,520
  • 3
  • 29
  • 53
4
import collections

# example token list
tokens = [{'Value':'Blah', 'Start':0}, {'Value':'BlahBlah'}]

count=collections.Counter([d['Value'] for d in tokens])
print count

shows

Counter({'BlahBlah': 1, 'Blah': 1})
Useless
  • 64,155
  • 6
  • 88
  • 132
0
token = [{
    'Value': "Blah",
    'StartOffset': 0,
    'EndOffset': 4
}, ... ]

value_counter = {}

for t in token:
    v = t['Value']
    if v not in value_counter:
        value_counter[v] = 0
    value_counter[v] += 1

print value_counter
Yarkee
  • 9,086
  • 5
  • 28
  • 29
0

Another efficient way is to convert data to Pandas DataFrame and then aggregate them. Like this:

import pandas as pd
df = pd.DataFrame(data)
df.groupby('key')['value'].count()
df.groupby('key')['value'].sum()
Masoud
  • 1,343
  • 8
  • 25