If I have a frame d
and a function f()
in R that looks like these:
df = data.frame(
group=c("cat","fish","horse","cat","fish","horse","cat","horse"),
x = c(1,4,7,2,5,8,3,9)
)
f <- function(animal,x) {
nchar(animal) + mean(x)*(x+1)
}
applying f()
to each group to add new column with the result of f()
is straightforward:
library(dplyr)
mutate(group_by(df,group),result=f(cur_group(),x))
Output:
group x result
<chr> <dbl> <dbl>
1 cat 1 7
2 fish 4 26.5
3 horse 7 69
4 cat 2 9
5 fish 5 31
6 horse 8 77
7 cat 3 11
8 horse 9 85
What is the correct way to do the same in python if d
is a pandas.DataFrame
?
import numpy as np
import pandas as pd
d = pd.DataFrame({"group":["cat","fish","horse","cat","fish","horse","cat","horse"], "x":[1,4,7,2,5,8,3,9]})
def f(animal,x):
return [np.mean(x)*(k+1) + len(animal) for k in x]
I know I can get the "correct" values like this:
d.groupby("group").apply(lambda g: f(g.name,g.x))
and can "explode" that into a single Series
using .explode()
, but what is the correct way to get the values added to the frame, in the correct order, etc:
Expected Output (python)
group x result
0 cat 1 7.0
1 fish 4 26.5
2 horse 7 69.0
3 cat 2 9.0
4 fish 5 31.0
5 horse 8 77.0
6 cat 3 11.0
7 horse 9 85.0