EDITED:
I got to playing around with this a little. I am new so take this with a grain of salt:
Load sample with remote source
import pandas as pd
df1_iter = pd.read_csv("https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv",
chunksize=5,
iterator=True)
Do a simple for
look to do .describe
and .T
on each chunk and append it to list
Next use pd.concat()
on df_list
df_list = []
for chunk in df1_iter:
df_list.append(chunk.describe().T)
df_concat = pd.concat(df_list)
Groupby
For the agg
I used function I thought to be useful, adjust as needed.
desc_df = df_concat.groupby(df_concat.index).agg(
{
'mean':'mean',
'std': 'std',
'min': 'min',
'25%': 'mean',
'50%': 'mean',
'75%': 'mean',
'max': 'max'
}
)
print(desc_df)
mean std min 25% 50% 75% max
am 0.433333 0.223607 0.000 0.333333 0.500000 0.500000 1.000
carb 3.100000 1.293135 1.000 2.250000 2.666667 4.083333 8.000
cyl 6.200000 0.636339 4.000 5.500000 6.000000 7.166667 8.000
disp 232.336667 40.954447 71.100 177.216667 195.233333 281.966667 472.000
drat 3.622833 0.161794 2.760 3.340417 3.649167 3.849583 4.930
gear 3.783333 0.239882 3.000 3.541667 3.916667 3.958333 5.000
hp 158.733333 44.053017 52.000 124.416667 139.333333 191.083333 335.000
mpg 19.753333 2.968229 10.400 16.583333 20.950000 23.133333 33.900
qsec 17.747000 0.868257 14.500 16.948333 17.808333 18.248333 22.900
vs 0.450000 0.102315 0.000 0.208333 0.416667 0.625000 1.000
wt 3.266900 0.598493 1.513 2.850417 3.042500 3.809583 5.424
I hope this was helpful.