The issue is that np.nansum
converts its input to a numpy array, so it effectively loses the column information (sum
doesn't do this). As a result, the groupby
doesn't get back any column information when constructing the output, so the output is just a Series of numpy arrays.
Specifically, the source code for np.nansum
calls the _replace_nan
function. In turn, the source code for _replace_nan
checks if the input is an array, and converts it to one if it's not.
All hope isn't lost though. You can easily replicate np.nansum
with Pandas functions. Specifically use sum
followed by fillna
:
df.groupby(df.columns, axis=1).sum().fillna(0)
The sum
should ignore NaN
's and just sum the non-null values. The only case you'll get back a NaN
is if all the values attempting to be summed are NaN
, which is why fillna
is required. Note that you could also do the fillna
before the groupby
, i.e. df.fillna(0).groupby...
.
If you really want to use np.nansum
, you can recast as pd.Series
. This will likely impact performance, as constructing a Series can be a relatively expensive, and you'll be doing it multiple times:
df.groupby(df.columns, axis=1).apply(lambda x: pd.Series(np.nansum(x, axis=1), x.index))
Example Computations
For some example computations, I'll be using the following simple DataFrame, which includes NaN
values (your example data doesn't):
df = pd.DataFrame([[1,2,2,np.nan,4],[np.nan,np.nan,np.nan,3,3],[np.nan,np.nan,-1,2,np.nan]], columns=list('aaabb'))
a a a b b
0 1.0 2.0 2.0 NaN 4.0
1 NaN NaN NaN 3.0 3.0
2 NaN NaN -1.0 2.0 NaN
Using sum
without fillna
:
df.groupby(df.columns, axis=1).sum()
a b
0 5.0 4.0
1 NaN 6.0
2 -1.0 2.0
Using sum
and fillna
:
df.groupby(df.columns, axis=1).sum().fillna(0)
a b
0 5.0 4.0
1 0.0 6.0
2 -1.0 2.0
Comparing to the fixed np.nansum
method:
df.groupby(df.columns, axis=1).apply(lambda x: pd.Series(np.nansum(x, axis=1), x.index))
a b
0 5.0 4.0
1 0.0 6.0
2 -1.0 2.0