Pandas DataFrame max() and apply(max) return different values when there are missing values in the data

Question

I'm going through the basics of data manipulation with pandas and while working on one of the exercises I've noticed some strange behavior in the work of max() method when there are missing values in the data. Here is a toy example.

First create a toy data

df = pd.DataFrame({'A': [1, np.nan], 'B': [np.nan, 1]})

It is a 2x2 DataFrame. The only difference between columns is that there is a missing value in the first row in the second column, and in the first column it is in the second row.

    A   B
0   1.0 NaN
1   NaN 1.0

Now I try to find maximum value in each column in different ways

Applying DataFrame.max() method.
```
df.max()        
```
It gives the results I've expected to get
```
A    1.0
B    1.0
dtype: float64
```
Using DataFrame.apply() method and using max as argument to this method
```
df.apply(max)
```
The result is
```
A    1.0
B    NaN
dtype: float64
```
What is unexpected here is that maximum of column B is reported to be NaN. I assume that the cause is the NaN value in the first row.
Using DataFrame.apply() method and using 'max' as argument to this method
```
df.apply('max')
```
Here the results are expected.
```
A    1.0
B    1.0
dtype: float64
```

Why the result of second approach is different from the other two?

Pandas DataFrame max() and apply(max) return different values when there are missing values in the data

0 Answers0