fast way to aggregate many metrics on a large dataframe

Asked Dec 19 '20 at 01:22

Active Dec 19 '20 at 01:50

Viewed 169 times

I have a large dataframe that has millions of rows and hundered of columns.

My aim is to apply some metrics on specific columns:

from scipy.stats import moment 
moment(df.groupby('id', as_index=False)[df.columns[0:100]], moment=3)

Also, I have to apply other statistics(mean, sd, 4th moment, Sarle's coecient, ...), please is there a way to do this aggregation and calculate many other stats faster?

edited Dec 19 '20 at 01:50

hpaulj

221,503
14
230
353

asked Dec 19 '20 at 01:22

roger

1

Unless they say so, `scipy.stats` functions can't work directly on dataframes, muchless pandas `groups`. I think you'll need to iterate on groups - but presumably you know how to work with `pandas` groups one by one. If not, you have some reading to do! – hpaulj Dec 19 '20 at 01:52
1

You may use [`pandas.core.groupby.GroupBy.apply`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.apply.html) to aggregate many calculations. See [Apply function to pandas groupby](https://stackoverflow.com/q/15374597/7758804) – Trenton McKinney Dec 19 '20 at 02:07
@hpaulj yes you re right , its not possible to use it with dataframes – roger Dec 19 '20 at 15:27
@TrentonMcKinney Ive tested it but it doesent provide for testing many functions one – roger Dec 19 '20 at 15:28

fast way to aggregate many metrics on a large dataframe

0 Answers0