124

I have a massive DataFrame, and I was wondering if there was a short (one or two liner) way to get a count of non-NaN entries in a DataFrame. I don't want to do this one column at a time as I have close to 1000 columns.

df1 = pd.DataFrame([(1,2,None),(None,4,None),(5,None,7),(5,None,None)], 
                    columns=['a','b','d'], index = ['A', 'B','C','D'])

    a   b   d
A   1   2 NaN
B NaN   4 NaN
C   5 NaN   7
D   5 NaN NaN

Output:

a: 3
b: 2
d: 1
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
cryp
  • 2,285
  • 3
  • 26
  • 33

4 Answers4

202

The count() method returns the number of non-NaN values in each column:

>>> df1.count()
a    3
b    2
d    1
dtype: int64

Similarly, count(axis=1) returns the number of non-NaN values in each row.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
  • I dont believe that works if the column has strings – DISC-O Jul 15 '21 at 19:30
  • @DISC-O: just tried and it works for me (pandas version 1.2.1). E.g. `df = pd.DataFrame({"a": ["x", np.nan, "z"]})` then `df.count()` produces the expected value `2`. Do you have an example where this method does not work? – Alex Riley Jul 15 '21 at 21:01
  • 2
    yes, if you manually create a df and place the np.nan it could work I guess. But that is not how you typically create your columns. One often used way, by me at least is: df['C'] =np.where(df.A>df.B,'some text',np.nan). This turns the np.nan into 'nan' and is no longer recognized as nan. – DISC-O Jul 17 '21 at 00:03
  • I have a column with None values and this doesnt work – West Nov 10 '22 at 09:20
  • @DISC-O (very late reply, apologies) - in that example you don't end up with any NaN values in the column (you have a column of string values) so the `.count()` method works as intended. Some NumPy methods, especially with strings, don't fit well with pandas and that's one of them so it's better to use pandas methods like `df["C"] = (df.A > df.B).map({True: 'some text', False: np.nan})` instead. – Alex Riley Nov 10 '22 at 11:32
  • @West: `.count()` should treat `None` as a null value and count it - happy to debug if you give an example of what you're seeing. – Alex Riley Nov 10 '22 at 11:33
8

If you want to sum the total count values which are not NAN, one can do;

np.sum(df.count())
hemanta
  • 1,405
  • 2
  • 13
  • 23
4

In case you are dealing with empty strings you may want to count them as NA as well :

df.replace('', np.nan).count()

or if you also want to remove blank strings :

df.replace(r'^\s*$', np.nan, regex=True).count()
Skippy le Grand Gourou
  • 6,976
  • 4
  • 60
  • 76
4

You can use methods notna / notnull and sum:

df.notna().sum()

Output:

a    3
b    2
d    1
dtype: int64
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73