2

The command below shows some details about the dataframe.

df.describe()

It gives details about count, mean, std, min, 25%, ...

Is there any way to get the count of rows in a dataframe at 75% or 25%?

Thanks.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
dxb
  • 273
  • 2
  • 13

1 Answers1

3
  • Use pandas.Series.quantile to determine the value for the given quantile of the selected column.
    • .quantile has the benefit of being able to specify any quantile value (e.g. 30%)
    • .describe() is limited to [25%, 50%, 75%], and it performs unnecessary aggregations.
  • Select the specific data using Boolean selection, with .ge and .le
    • .ge is >=
    • .le is <=
    • .eq is ==
  • Once you have all the values matching the criteria, use something like quartile_25.count() or len(quartile_25), to get determine how many values meet the criteria.
  • col should be some column name as a string
quartile_75 = df[df[col].ge(df[col].quantile(q=.75))]
quartile_25 = df[df[col].le(df[col].quantile(q=.25))]
max_ = df[df[col].eq(df[col].max())]
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • Hi, could you take a look at this question https://stackoverflow.com/questions/70954791/identifying-statistical-outliers-with-pandas-groupby-and-reduce-rows-into-diffe – Aaditya Ura Feb 02 '22 at 11:36