Perhaps a dumb question but..
In R data.table, if I want to get the mean of a column, I can reference a column vector like foo$x
and calculate its mean with something like mean(foo$x)
.
I can't figure out how to do this operation with Python datatable. For example,
# imports
import numpy as np
import datatable as dt
from datatable import f
# make datatable
np.random.seed(1)
foo = dt.Frame({'x': np.random.randn(10)})
# calculate mean
dt.mean(foo.x) # error
dt.mean(foo[:, f.x]) # Expr:mean(<Frame [10 rows x 1 col]>) ???
foo[:, dt.mean(f.x)][0, 0] # -0.0971
While the last statement technically works, it seems overly cumbersome as it first returns a 1x1 datatable
from which I extract the only value. The fundamental issue I'm struggling with is, I don't understand if column vectors exists in python datatable and/or how to reference them.
In short, is there a simpler way to calculate the mean of a column with python datable?