Does Pandas have an equivalent of R's na (meaning not available)? If not, what is the convention for representing a missing value, as opposed to NaN which represents a mathematically impossible value such as a divide by zero?
3 Answers
Currently there is no NA value available in Pandas or NumPy. From the section "Working with missing data" in the Pandas manual (http://pandas.pydata.org/pandas-docs/stable/missing_data.html):
The choice of using NaN internally to denote missing data was largely for simplicity and performance reasons. It differs from the MaskedArray approach of, for example,
scikits.timeseries
. We are hopeful that NumPy will soon be able to provide a native NA type solution (similar to R) performant enough to be used in pandas.
Also, this part of the documentation (http://pandas.pydata.org/pandas-docs/stable/gotchas.html#nan-integer-na-values-and-na-type-promotions) provides more details on the trade-offs in this choice of NA representation.

- 620
- 1
- 5
- 11
-
1See also [NumPy or Pandas: Keeping array type as integer while having a NaN value](http://stackoverflow.com/questions/11548005/numpy-or-pandas-keeping-array-type-as-integer-while-having-a-nan-value/11548224#11548224) – smci Dec 06 '14 at 02:32
-
You can use it from numpy
:
import numpy as np
np.nan
or simply
float('NaN')
In pandas docs the np.nan
version is used mostly: http://pandas.pydata.org/pandas-docs/dev/missing_data.html

- 24,826
- 18
- 96
- 135