Sum all values in a dataframe

Question

I am trying to sum all the values in a dataframe into one number.

So for example with the dataframe

            BBG.XAMS.FUR.S_pnl_pos_cost  BBG.XAMS.MT.S_pnl_pos_cost
date                                                               
2015-03-23                    -0.674996                   -0.674997
2015-03-24                    82.704951                   11.868748
2015-03-25                   -11.027327                   84.160210
2015-03-26                   228.426675                 -131.901556
2015-03-27                   -99.744986                  214.579858

I would like the value 377.71658 returned.

I have tried df.sum() but that only sums by column.

Granted that there must be a solution that sums both columns as a single action, what's wrong with adding the sums for each column together as a second step? — Thijs van Dien, Sep 01 '15 at 20:57
Does this answer your question? [What's the best way to sum all values in a Pandas dataframe?](https://stackoverflow.com/questions/38733477/whats-the-best-way-to-sum-all-values-in-a-pandas-dataframe) — AMC, Apr 24 '20 at 01:00

score 18 · Answer 1 · answered Sep 01 '15 at 20:59

18

I would do

>>> df.values.sum()
377.71658000000002

which drops down to the underlying numpy array, and is likely to be the fastest, if the frame is all-numeric. But there are lots of other options:

>>> %timeit df.values.sum()
100000 loops, best of 3: 6.27 µs per loop
>>> %timeit df.sum().sum()
10000 loops, best of 3: 109 µs per loop
>>> %timeit df.unstack().sum()
1000 loops, best of 3: 233 µs per loop
>>> %timeit df.stack().sum()
1000 loops, best of 3: 190 µs per loop

answered Sep 01 '15 at 20:59

DSM

342,061
65
592
494

2

Notice this fails if you have NaN as a value. You could replace NaN with zeros. df.fillna(0).values.sum() – Brig Jan 12 '17 at 15:31
https://stackoverflow.com/questions/38733477/whats-the-best-way-to-sum-all-values-in-a-pandas-dataframe – AMC Apr 24 '20 at 01:00

Alexander · Answer 2 · 2023-03-29T23:56:08.037

Just sum the column sums:

df.sum().sum()

Or for better performance:

np.nansum(df)

Note that you need to use nansum to treat NaNs as zeroes for the purpose of summing them.

Timings:

# Create dataframe with 1m rows and 100 columns.
np.random.seed(0)
rows = 1_000_000
cols = 100
df = pd.DataFrame(np.random.randn(rows, cols))
# Add one thousand NaNs.
for row, col in zip(np.random.randint(0, rows, 1000),
                    np.random.randint(0, cols, 1000)):
    df.iat[row, col] = np.nan

%timeit np.nansum(df)
# 274 ms ± 3.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df.fillna(0).to_numpy().sum()
# 974 ms ± 3.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df.sum().sum()
# 1.04 s ± 3.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> df.to_numpy().sum()
nan

>>> np.nansum(df)
5965.87530314851

https://stackoverflow.com/questions/38733477/whats-the-best-way-to-sum-all-values-in-a-pandas-dataframe — AMC, Apr 24 '20 at 01:00

score 2 · Answer 3 · answered Oct 08 '18 at 07:11

2

If data-frame contains non-numeric data, and want to just get sum of integers, do:

df.sum(numeric_only=True).sum()

answered Oct 08 '18 at 07:11

U13-Forward

69,221
14
89
114

https://stackoverflow.com/questions/38733477/whats-the-best-way-to-sum-all-values-in-a-pandas-dataframe – AMC Apr 24 '20 at 01:00

score 0 · Answer 4 · edited Dec 21 '21 at 13:50

0

To sum of integer/float columns in a dataframe use the below query:

df.sum()

To find the sum of a particular column values you can use column index:

d = df.sum()
d[2]

edited Dec 21 '21 at 13:50

Patrick

1,189
5
11
19

answered Dec 21 '21 at 05:56

Sangram Mohanty

22
5

score 0 · Answer 5 · edited Apr 02 '23 at 22:46

0

Another option:

np.array(df.iloc[:,1:).sum()

edited Apr 02 '23 at 22:46

MD Mushfirat Mohaimin

1,966
3
10
22

answered Mar 29 '23 at 19:11

Sajil Alakkalakath

171
1
6

Sum all values in a dataframe

5 Answers5

Linked

Related