I would like to use groupby
on my dataframe and then chain a series of function calls on each group with apply
.
As a first prototype, I've set up an example where I convert the entries of my dataframe from string to numeric. The dataframe looks like this:
frame = pd.DataFrame({
"number": ["1", "2", "3", "4", "5", "6", "7", "8"],
"type": ["a",] * 4 + ["b",] * 4})
The resulting dataframe is:
The numbers in this dataframe are strings. So before I can use any math operations, they have to be converted to a numerical type. That's what I would like to do with apply:
frame.groupby("type")["number"].apply(pd.to_numeric)
But the result is a single series which contains all items:
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
Name: number, dtype: int64
I've read the docs for this. Apparently you can use transform
or apply
.
In the samples, the grouped structure seems to be kept.
Maybe it is something related to pd.to_numeric
? So I tried:
frame.groupby("type")["number"].apply(lambda x: int(x))
Which results in a TypeError:
TypeError: cannot convert the series to
Apparently the apply gets a whole group as parameter. The results for each group seem to be concatenated into one dataframe.
Is it possible to use apply in a way that keeps the grouped structure ? I would like a call that applies the function to each column within the groups and keeps the groups. Then I could chain the calls.
A related question I've found is this: pandas: sample groups after groupby
But the answer suggests to apply the function before the grouping. Which doesn't work well with chaining the functions. And not at all for something like mean()
.