Count number of non-NaN entries in every column of Dataframe

Question

I have a massive DataFrame, and I was wondering if there was a short (one or two liner) way to get a count of non-NaN entries in a DataFrame. I don't want to do this one column at a time as I have close to 1000 columns.

df1 = pd.DataFrame([(1,2,None),(None,4,None),(5,None,7),(5,None,None)], 
                    columns=['a','b','d'], index = ['A', 'B','C','D'])

    a   b   d
A   1   2 NaN
B NaN   4 NaN
C   5 NaN   7
D   5 NaN NaN

Output:

a: 3
b: 2
d: 1

The extra indexing with `df1.notnull()` is not necessary since `count` ignores null values anyway. — Alex Riley, Apr 30 '15 at 15:11
Unlike `series.value_counts(..., dropna=False)`, there is no option on `df.count()` to directly get NA counts. — smci, Nov 17 '16 at 06:36

score 202 · Answer 1 · answered Apr 30 '15 at 15:01

202

The count() method returns the number of non-NaN values in each column:

>>> df1.count()
a    3
b    2
d    1
dtype: int64

Similarly, count(axis=1) returns the number of non-NaN values in each row.

answered Apr 30 '15 at 15:01

Alex Riley

169,130
45
262
238

I dont believe that works if the column has strings – DISC-O Jul 15 '21 at 19:30
@DISC-O: just tried and it works for me (pandas version 1.2.1). E.g. `df = pd.DataFrame({"a": ["x", np.nan, "z"]})` then `df.count()` produces the expected value `2`. Do you have an example where this method does not work? – Alex Riley Jul 15 '21 at 21:01
2

yes, if you manually create a df and place the np.nan it could work I guess. But that is not how you typically create your columns. One often used way, by me at least is: df['C'] =np.where(df.A>df.B,'some text',np.nan). This turns the np.nan into 'nan' and is no longer recognized as nan. – DISC-O Jul 17 '21 at 00:03
I have a column with None values and this doesnt work – West Nov 10 '22 at 09:20
@DISC-O (very late reply, apologies) - in that example you don't end up with any NaN values in the column (you have a column of string values) so the `.count()` method works as intended. Some NumPy methods, especially with strings, don't fit well with pandas and that's one of them so it's better to use pandas methods like `df["C"] = (df.A > df.B).map({True: 'some text', False: np.nan})` instead. – Alex Riley Nov 10 '22 at 11:32
@West: `.count()` should treat `None` as a null value and count it - happy to debug if you give an example of what you're seeing. – Alex Riley Nov 10 '22 at 11:33

score 8 · Answer 2 · answered Jan 31 '20 at 21:49

8

If you want to sum the total count values which are not NAN, one can do;

np.sum(df.count())

answered Jan 31 '20 at 21:49

hemanta

1,405
2
13
23

score 4 · Answer 3 · answered Feb 17 '21 at 17:47

4

In case you are dealing with empty strings you may want to count them as NA as well :

df.replace('', np.nan).count()

or if you also want to remove blank strings :

df.replace(r'^\s*$', np.nan, regex=True).count()

answered Feb 17 '21 at 17:47

Skippy le Grand Gourou

6,976
4
60
76

score 4 · Answer 4 · answered Oct 18 '22 at 17:54

4

You can use methods notna / notnull and sum:

df.notna().sum()

Output:

a    3
b    2
d    1
dtype: int64

answered Oct 18 '22 at 17:54

Mykola Zotko

15,583
3
71
73

Count number of non-NaN entries in every column of Dataframe

4 Answers4

Linked

Related