0

I have a large dataframe that has millions of rows and hundered of columns.

My aim is to apply some metrics on specific columns:

from scipy.stats import moment 
moment(df.groupby('id', as_index=False)[df.columns[0:100]], moment=3)

Also, I have to apply other statistics(mean, sd, 4th moment, Sarle's coecient, ...), please is there a way to do this aggregation and calculate many other stats faster?

hpaulj
  • 221,503
  • 14
  • 230
  • 353
roger
  • 89
  • 10
  • 1
    Unless they say so, `scipy.stats` functions can't work directly on dataframes, muchless pandas `groups`. I think you'll need to iterate on groups - but presumably you know how to work with `pandas` groups one by one. If not, you have some reading to do! – hpaulj Dec 19 '20 at 01:52
  • 1
    You may use [`pandas.core.groupby.GroupBy.apply`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.apply.html) to aggregate many calculations. See [Apply function to pandas groupby](https://stackoverflow.com/q/15374597/7758804) – Trenton McKinney Dec 19 '20 at 02:07
  • @hpaulj yes you re right , its not possible to use it with dataframes – roger Dec 19 '20 at 15:27
  • @TrentonMcKinney Ive tested it but it doesent provide for testing many functions one – roger Dec 19 '20 at 15:28

0 Answers0