18

I am trying to sum all the values in a dataframe into one number.

So for example with the dataframe

            BBG.XAMS.FUR.S_pnl_pos_cost  BBG.XAMS.MT.S_pnl_pos_cost
date                                                               
2015-03-23                    -0.674996                   -0.674997
2015-03-24                    82.704951                   11.868748
2015-03-25                   -11.027327                   84.160210
2015-03-26                   228.426675                 -131.901556
2015-03-27                   -99.744986                  214.579858

I would like the value 377.71658 returned.

I have tried df.sum() but that only sums by column.

halfer
  • 19,824
  • 17
  • 99
  • 186
Stacey
  • 4,825
  • 17
  • 58
  • 99
  • 1
    `print(df.sum().sum())?` – Padraic Cunningham Sep 01 '15 at 20:56
  • Granted that there must be a solution that sums both columns as a single action, what's wrong with adding the sums for each column together as a second step? – Thijs van Dien Sep 01 '15 at 20:57
  • Does this answer your question? [What's the best way to sum all values in a Pandas dataframe?](https://stackoverflow.com/questions/38733477/whats-the-best-way-to-sum-all-values-in-a-pandas-dataframe) – AMC Apr 24 '20 at 01:00

5 Answers5

18

I would do

>>> df.values.sum()
377.71658000000002

which drops down to the underlying numpy array, and is likely to be the fastest, if the frame is all-numeric. But there are lots of other options:

>>> %timeit df.values.sum()
100000 loops, best of 3: 6.27 µs per loop
>>> %timeit df.sum().sum()
10000 loops, best of 3: 109 µs per loop
>>> %timeit df.unstack().sum()
1000 loops, best of 3: 233 µs per loop
>>> %timeit df.stack().sum()
1000 loops, best of 3: 190 µs per loop
DSM
  • 342,061
  • 65
  • 592
  • 494
  • 2
    Notice this fails if you have NaN as a value. You could replace NaN with zeros. df.fillna(0).values.sum() – Brig Jan 12 '17 at 15:31
  • https://stackoverflow.com/questions/38733477/whats-the-best-way-to-sum-all-values-in-a-pandas-dataframe – AMC Apr 24 '20 at 01:00
6

Just sum the column sums:

df.sum().sum()

Or for better performance:

np.nansum(df)

Note that you need to use nansum to treat NaNs as zeroes for the purpose of summing them.

Timings:

# Create dataframe with 1m rows and 100 columns.
np.random.seed(0)
rows = 1_000_000
cols = 100
df = pd.DataFrame(np.random.randn(rows, cols))
# Add one thousand NaNs.
for row, col in zip(np.random.randint(0, rows, 1000),
                    np.random.randint(0, cols, 1000)):
    df.iat[row, col] = np.nan

%timeit np.nansum(df)
# 274 ms ± 3.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df.fillna(0).to_numpy().sum()
# 974 ms ± 3.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df.sum().sum()
# 1.04 s ± 3.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> df.to_numpy().sum()
nan

>>> np.nansum(df)
5965.87530314851
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • https://stackoverflow.com/questions/38733477/whats-the-best-way-to-sum-all-values-in-a-pandas-dataframe – AMC Apr 24 '20 at 01:00
2

If data-frame contains non-numeric data, and want to just get sum of integers, do:

df.sum(numeric_only=True).sum()
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
  • https://stackoverflow.com/questions/38733477/whats-the-best-way-to-sum-all-values-in-a-pandas-dataframe – AMC Apr 24 '20 at 01:00
0

To sum of integer/float columns in a dataframe use the below query:

df.sum()

To find the sum of a particular column values you can use column index:

d = df.sum()
d[2]
Patrick
  • 1,189
  • 5
  • 11
  • 19
0

Another option:

np.array(df.iloc[:,1:).sum()
MD Mushfirat Mohaimin
  • 1,966
  • 3
  • 10
  • 22