2

Does Pandas have an equivalent of R's na (meaning not available)? If not, what is the convention for representing a missing value, as opposed to NaN which represents a mathematically impossible value such as a divide by zero?

cbare
  • 12,060
  • 8
  • 56
  • 63

3 Answers3

3

Currently there is no NA value available in Pandas or NumPy. From the section "Working with missing data" in the Pandas manual (http://pandas.pydata.org/pandas-docs/stable/missing_data.html):

The choice of using NaN internally to denote missing data was largely for simplicity and performance reasons. It differs from the MaskedArray approach of, for example, scikits.timeseries. We are hopeful that NumPy will soon be able to provide a native NA type solution (similar to R) performant enough to be used in pandas.

Also, this part of the documentation (http://pandas.pydata.org/pandas-docs/stable/gotchas.html#nan-integer-na-values-and-na-type-promotions) provides more details on the trade-offs in this choice of NA representation.

Jon Lund Steffensen
  • 620
  • 1
  • 5
  • 11
  • 1
    See also [NumPy or Pandas: Keeping array type as integer while having a NaN value](http://stackoverflow.com/questions/11548005/numpy-or-pandas-keeping-array-type-as-integer-while-having-a-nan-value/11548224#11548224) – smci Dec 06 '14 at 02:32
  • As for 2022, you can use pandas.NA to represent a missing value. – – schiavuzzi Apr 12 '22 at 12:00
1

It comes from numpy

from numpy import nan
x = nan
Greg
  • 5,422
  • 1
  • 27
  • 32
1

You can use it from numpy:

import numpy as np
np.nan

or simply

float('NaN')

In pandas docs the np.nan version is used mostly: http://pandas.pydata.org/pandas-docs/dev/missing_data.html

tamasgal
  • 24,826
  • 18
  • 96
  • 135