1

I have a dataFrame like:

    a  b
0   4  7
1   3  2
2   1  9
3   3  4
4   2  Nan

I need to calculate min, mean, std, sum, for all dataFrame as a single list of numbers. (e.g minimum here is 1)

EDIT: The data may have Nans or different size columns.

df.to_numpy().mean()

Produce Nan, because there are nans in the arrays and they have different length.

How to calculate all normal math stuff on all of these numbers ?

  • Did you try something like `df['a'].to_numpy().concat(df['b'].to_numpy())`. I am not sure if it's work but I think youg et the spirit. – Eloi Nov 13 '22 at 21:52
  • yes but the lists are different sizes and have Nan inside. So results are completely wrong. – gotiredofcoding Nov 14 '22 at 06:49
  • I also did tried profits.to_numpy().mean() which produce Nan. – gotiredofcoding Nov 14 '22 at 06:54
  • This question is similar to this one and so the answers there may help: [What's the best way to sum all values in a Pandas dataframe](https://stackoverflow.com/q/38733477/1609514) – Bill Nov 14 '22 at 07:01

1 Answers1

2

Pandas solution is with reshape by DataFrame.stack and Series.agg:

def std_ddof0(x):
    return x.std(ddof=0)

out = df.stack().agg(['mean','sum',std_ddof0, 'min'])
print (out)
mean          3.888889
sum          35.000000
std_ddof0     2.424158
min           1.000000
dtype: float64

Numpy solution with np.nanmean, np.nansum, np.nanstd, np.nanmin:

totalp = df.to_numpy().reshape(-1)

out = np.nanmean(totalp), np.nansum(totalp), np.nanstd(totalp), np.nanmin(totalp)
print (out)
(3.888888888888889, 35.0, 2.4241582476968255, 1.0)

Another idea is remove missing values first:

totalp = df.to_numpy().reshape(-1)
totalp = totalp[~np.isnan(totalp)]
print (totalp)
[4. 7. 3. 2. 1. 9. 3. 4. 2.]

out = np.mean(totalp), np.sum(totalp), np.std(totalp), np.min(totalp)
print (out)
(3.888888888888889, 35.0, 2.4241582476968255, 1.0)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252