3

I would like to use groupby on my dataframe and then chain a series of function calls on each group with apply.

As a first prototype, I've set up an example where I convert the entries of my dataframe from string to numeric. The dataframe looks like this:

frame = pd.DataFrame({
    "number": ["1", "2", "3", "4", "5", "6", "7", "8"], 
    "type": ["a",] * 4 + ["b",] * 4})

The resulting dataframe is:

structure of the dataframe

The numbers in this dataframe are strings. So before I can use any math operations, they have to be converted to a numerical type. That's what I would like to do with apply:

frame.groupby("type")["number"].apply(pd.to_numeric)

But the result is a single series which contains all items:

0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
Name: number, dtype: int64

I've read the docs for this. Apparently you can use transform or apply. In the samples, the grouped structure seems to be kept.

Maybe it is something related to pd.to_numeric ? So I tried:

frame.groupby("type")["number"].apply(lambda x: int(x))

Which results in a TypeError:

TypeError: cannot convert the series to

Apparently the apply gets a whole group as parameter. The results for each group seem to be concatenated into one dataframe.

Is it possible to use apply in a way that keeps the grouped structure ? I would like a call that applies the function to each column within the groups and keeps the groups. Then I could chain the calls.

A related question I've found is this: pandas: sample groups after groupby

But the answer suggests to apply the function before the grouping. Which doesn't work well with chaining the functions. And not at all for something like mean().

lhk
  • 27,458
  • 30
  • 122
  • 201
  • 1
    Why don't you do `df['number'] = df['number'].astype(int)` *first*, and they do your groupby? – juanpa.arrivillaga Jan 16 '18 at 19:19
  • The setup is just an example. This tip is basically the same as the answer on sampling after groupby: applying the necessary functions first. It's definitely a good answer for this specific problem, but I would like to know how to do the chaining for more complex cases – lhk Jan 16 '18 at 19:23
  • I don't understand your title. What do you mean keep group structure? """Is it possible to use apply in a way that keeps the grouped structure ? I would like a call that applies the function to each column within the groups and keeps the groups. Then I could chain the calls.""" <- what's these means? can you use more examples to illustrate? What's the `grouped structure` you want to keep? Are you trying to say you want to keep the number of rows the same? – Tai Jan 16 '18 at 20:36

1 Answers1

1

The messages and behaviors you are getting here are because you are in fact calling : pd.core.groupby.SeriesGroupBy.apply(self, func, *args, **kwargs) and not Series.apply or DataFrame.apply.

But the result is a single series which contains all items:

It seems to correspond with case #3 described here.

Apparently the apply gets a whole group as parameter.

Yes

The results for each group seem to be concatenated into one dataframe.

Depends on the case linked above

Is it possible to use apply in a way that keeps the grouped structure ? I would like a call that applies the function to each column within the groups and keeps the groups. Then I could chain the calls.

You would have to give more details on what you are trying to achieve but aggregate or transform seem good candidates indeed

Phik
  • 414
  • 3
  • 15