0

I would like to iterate through groups in a dataframe. This is possible in pandas, but when I port this to koalas, I get an error.

import databricks.koalas as ks
import pandas as pd

pdf = pd.DataFrame({'x':range(3), 'y':['a','b','b'], 'z':['a','b','b']})

# Create a Koalas DataFrame from pandas DataFrame
df = ks.from_pandas(pdf)

for a in df.groupby('x'):
    print(a)

Here is the error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-35-d4164d1f71e0> in <module>
----> 1 for a in df.groupby('x'):
      2     print(a)

/opt/conda/lib/python3.7/site-packages/databricks/koalas/groupby.py in __getitem__(self, item)
   2630         if self._as_index and is_name_like_value(item):
   2631             return SeriesGroupBy(
-> 2632                 self._kdf._kser_for(item if is_name_like_tuple(item) else (item,)),
   2633                 self._groupkeys,
   2634                 dropna=self._dropna,

/opt/conda/lib/python3.7/site-packages/databricks/koalas/frame.py in _kser_for(self, label)
    721         Name: id, dtype: int64
    722         """
--> 723         return self._ksers[label]
    724 
    725     def _apply_series_op(self, op, should_resolve: bool = False):

KeyError: (0,)

Is this kind of group iteration possible in koalas? The koalas documentation kind of implies it is possible - https://koalas.readthedocs.io/en/latest/reference/groupby.html

Chogg
  • 389
  • 2
  • 19

2 Answers2

0

Groupby iteration is not yet implemented:

https://github.com/databricks/koalas/issues/2014

Chogg
  • 389
  • 2
  • 19
0

Don't use for-in, use apply instead:

df1.groupby("School").apply(lambda dd:print(dd))
Tyler2P
  • 2,324
  • 26
  • 22
  • 31
G.G
  • 639
  • 1
  • 5