How to aggregate columns of type `dict`

Question

I have a Frame as follows:

x = dt.Frame(k = [1, 1, 2], 
             v = [{'a':1, 'b':2}, {'a':3}, {'b':4}])

which looks like this:

k       v
▪▪▪▪    ▪▪▪▪▪▪▪▪
1       {'a': 1, 'b': 2}
1       {'a': 3}
2       {'b': 4}

What I'm trying to do is to 1) group by k, and 2) aggregate the count in the dictionary. The desired output:

k       v
▪▪▪▪    ▪▪▪▪▪▪▪▪
1       {'a': 4, 'b': 2}
2       {'b': 4}

Is it possible to achieve with the latest pydatatable(v0.11)?

it's better you modify the dictionaries rather than dataframe — deadshot, Sep 04 '20 at 19:42
@deadshot Would you elaborate on your point? The original data is stored as a `pandas.DataFrame`(the column types are exactly the same) and I can achieve my goal with `DataFrame.group`. However, I found it painful due to the data size. That's why I took a look at the `pydatatable`. — R. Zhu, Sep 04 '20 at 19:55

score 3 · Accepted Answer · answered Sep 04 '20 at 21:54

If you have a large dataset then consider expanding all dictionaries into a frame:

>>> DT = dt.cbind(dt.Frame(_key=[1,1,2]), 
                  dt.Frame([{'a':1, 'b':2}, {'a':3}, {'b':4}]))
>>> DT
   | _key   a   b
-- + ----  --  --
 0 |    1   1   2
 1 |    1   3  NA
 2 |    2  NA   4

[3 rows x 3 columns]

After this, grouping is easy:

>>> from datatable import sum, f, by
>>> DT[:, sum(f[:]), by(f._key)]
   | _key   a   b
-- + ----  --  --
 0 |    1   4   2
 1 |    2   0   4

[2 rows x 3 columns]

Thanks @Pasha. Yes I should have considering expanding the dictionary first. — R. Zhu, Sep 05 '20 at 22:10

How to aggregate columns of type `dict`

1 Answers1