Build a Dict of Means from Pandas

Question

I have a dataframe that looks like this:

    start   stop   duration
0   1       2      1
1   3       4      2
2   2       1      2
3   4       3      0

I'm trying to build a dictionary with key= (start, stop) pairs and the value= avg of their duration, regardless of the order. In other words, (1,2) and (2,1) would both count as an occurrence of the pair (1,2).

Desired output: dict_avg= {(1,2):1.5, (3,4):1}

What's the best way to achieve this?

Brad Solomon · Accepted Answer · 2018-12-05T18:17:17.223

Here's one possible approach by taking a tuple of the sorted pairs:

>>> grp = df[['start', 'stop']].apply(lambda x: tuple(sorted(x)), axis=1)
>>> df.groupby(grp)['duration'].mean().to_dict()
{(1, 2): 1.5, (3, 4): 1.0}

As a disclaimer, I can almost guarantee this will be significantly slower than the NumPy-sort given here, as using a lambda within .apply() (and needing to use sorted() + tuple() constructor) takes each call in the Python space, rather than letting it be done in Cython/C as you can ideally do via Pandas/NumPy.

score 3 · Answer 2 · answered Dec 05 '18 at 17:57

3

Using frozenset not fast but neat , if you looking for efficient way check link

df.groupby(df[['start','stop']].apply(frozenset,1).map(tuple)).duration.mean().to_dict()
Out[1048]: {(1, 2): 1.5, (3, 4): 1.0}

answered Dec 05 '18 at 17:57

BENY

317,841
20
164
234

score 3 · Answer 3 · answered Dec 05 '18 at 17:59

3

This is also one way:

# sort data based on first two columns
df.iloc[:,:2].values.sort()

# create the dict of mean
df.groupby(['start','stop'])['duration'].mean().to_dict()

{(1, 2): 1.5, (3, 4): 1.0}

answered Dec 05 '18 at 17:59

YOLO

20,181
5
20
40

score 3 · Answer 4 · answered Dec 05 '18 at 18:02

3

`defaultdict`

from collections import defaultdict

m = defaultdict(list)

for *t, d in zip(*map(df.get, df)):
    m[tuple({*t})].append(d)

{k: sum(v) / len(v) for k, v in m.items()}

{(1, 2): 1.5, (3, 4): 1.0}

answered Dec 05 '18 at 18:02

piRSquared

285,575
57
475
624

Build a Dict of Means from Pandas

4 Answers4

defaultdict

`defaultdict`