I want to groupby two values and if the group contains more than one element, return only the first row of the group with the value replaced by the mean for the group. If there is only one element, I want to return directly. My code looks like this:
final = df.groupby(["a", "b"]).apply(condense).drop(['a', 'b'], axis=1).reset_index()
def condense(df):
if df.shape[0] > 1:
mean = df["c"].mean()
record = df.iloc[[0]]
record["c"] = mean
return(record)
else:
return(df)
And the df looks something like this:
a b c d
"f" "e" 2 True
"f" "e" 3 False
"c" "a" 1 True
As the data frame is quite large, I have 73800 groups and the computation of the whole groupby + apply takes about a minute. This is far too long. Is there a way to make it run faster?