Is there in pandas data frame an equivalent to using 'by' in R data.table?
for example in R I can do:
DT = data.table(x = c('a', 'a', 'a', 'b', 'b', 'b'), y = rnorm(6))
DT[, z := mean(y[1:2]), by = x]
Is there something similar in pandas?
Is there in pandas data frame an equivalent to using 'by' in R data.table?
for example in R I can do:
DT = data.table(x = c('a', 'a', 'a', 'b', 'b', 'b'), y = rnorm(6))
DT[, z := mean(y[1:2]), by = x]
Is there something similar in pandas?
If we need to get the similar output as in data.table
where we want to take the first elements of 'y' grouped by 'x' and create a new column 'z', then
mean1 = lambda x: x.head(2).mean()
df['z'] = df['y'].groupby(df['x']).transform(mean1)
print(df)
# x y z
#0 a 1.329212 0.279589
#1 a -0.770033 0.279589
#2 a -0.316280 0.279589
#3 b -0.990810 -1.030813
#4 b -1.070816 -1.030813
#5 b -1.438713 -1.030813
Using the OP's code for data.table
in R
library(data.table)
DT[, z := mean(y[1:2]), by = x]
DT
# x y z
#1: a 1.329212 0.2795895
#2: a -0.770033 0.2795895
#3: a -0.316280 0.2795895
#4: b -0.990810 -1.0308130
#5: b -1.070816 -1.0308130
#6: b -1.438713 -1.0308130
import pandas as pd
import numpy as np
from numpy import random
np.random.seed(seed=24)
df = pd.DataFrame({'x': ['a', 'a', 'a', 'b', 'b', 'b'],
'y': random.randn(6)})
DT <- structure(list(x = c("a", "a", "a", "b", "b", "b"),
y = c(1.329212,
-0.770033, -0.31628, -0.99081, -1.070816, -1.438713)), .Names = c("x",
"y"), class = c("data.table", "data.frame"),
row.names = c(NA, -6L))