3

Normally, NaN (not a number) propagates through calculations, so I don't need to check for NaN in each step. This works almost always, but apparently there are exceptions. For example:

>>> nan = float('nan')
>>> pow(nan, 0)
1.0

I found the following comment on this:

The propagation of quiet NaNs through arithmetic operations allows errors to be detected at the end of a sequence of operations without extensive testing during intermediate stages. However, note that depending on the language and the function, NaNs can silently be removed in expressions that would give a constant result for all other floating-point values e.g. NaN^0, which may be defined as 1, so in general a later test for a set INVALID flag is needed to detect all cases where NaNs are introduced.

To satisfy those wishing a more strict interpretation of how the power function should act, the 2008 standard defines two additional power functions; pown(x, n) where the exponent must be an integer, and powr(x, y) which returns a NaN whenever a parameter is a NaN or the exponentiation would give an indeterminate form.

Is there a way to check the INVALID flag mentioned above through Python? Alternatively, is there any other approach to catch cases where NaN does not propagate?

Motivation: I decided to use NaN for missing data. In my application, missing inputs should result in missing result. It works great, with the exception I described.

max
  • 49,282
  • 56
  • 208
  • 355

4 Answers4

3

I realise that a month has passed since this was asked, but I've come across a similar problem (i.e. pow(float('nan'), 1) throws an exception in some Python implementations, e.g. Jython 2.52b2), and I found the above answers weren't quite what I was looking for.

Using a MissingData type as suggested by 6502 seems like the way to go, but I needed a concrete example. I tried Ethan Furman's NullType class but found that that this didn't work with any arithmetic operations as it doesn't coerce data types (see below), and I also didn't like that it explicitly named each arithmetic function that was overriden.

Starting with Ethan's example and tweaking code I found here, I arrived at the class below. Although the class is heavily commented you can see that it actually only has a handful of lines of functional code in it.

The key points are: 1. Use coerce() to return two NoData objects for mixed type (e.g. NoData + float) arithmetic operations, and two strings for string based (e.g. concat) operations. 2. Use getattr() to return a callable NoData() object for all other attribute/method access 3. Use call() to implement all other methods of the NoData() object: by returning a NoData() object

Here's some examples of its use.

>>> nd = NoData()
>>> nd + 5
NoData()
>>> pow(nd, 1)
NoData()
>>> math.pow(NoData(), 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: nb_float should return float object
>>> nd > 5
NoData()
>>> if nd > 5:
...     print "Yes"
... else:
...     print "No"
... 
No
>>> "The answer is " + nd
'The answer is NoData()'
>>> "The answer is %f" % (nd)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: float argument required, not instance
>>> "The answer is %s" % (nd)
'The answer is '
>>> nd.f = 5
>>> nd.f
NoData()
>>> nd.f()
NoData()

I noticed that using pow with NoData() calls the ** operator and hence works with NoData, but using math.pow does not as it first tries to convert the NoData() object to a float. I'm happy using the non math pow - hopefully 6502 etc were using math.pow when they had problems with pow in their comments above.

The other issue I can't think of a way of solving is the use with the format (%f) operator... No methods of NoData are called in this case, the operator just fails if you don't provide a float. Anyway here's the class itself.

class NoData():
"""NoData object - any interaction returns NoData()"""
def __str__(self):
    #I want '' returned as it represents no data in my output (e.g. csv) files
    return ''        

def __unicode__(self):
    return ''

def __repr__(self):
    return 'NoData()'

def __coerce__(self, other_object):
    if isinstance(other_object, str) or isinstance(other_object, unicode):
        #Return string objects when coerced with another string object.
        #This ensures that e.g. concatenation operations produce strings.
        return repr(self), other_object  
    else:
        #Otherwise return two NoData objects - these will then be passed to the appropriate
        #operator method for NoData, which should then return a NoData object
        return self, self

def __nonzero__(self):
    #__nonzero__ is the operation that is called whenever, e.g. "if NoData:" occurs
    #i.e. as all operations involving NoData return NoData, whenever a 
    #NoData object propagates to a test in branch statement.       
    return False        

def __hash__(self):
    #prevent NoData() from being used as a key for a dict or used in a set
    raise TypeError("Unhashable type: " + self.repr())

def __setattr__(self, name, value):
    #This is overridden to prevent any attributes from being created on NoData when e.g. "NoData().f = x" is called
    return None       

def __call__(self, *args, **kwargs):
    #if a NoData object is called (i.e. used as a method), return a NoData object
    return self    

def __getattr__(self,name):
    #For all other attribute accesses or method accesses, return a NoData object.
    #Remember that the NoData object can be called (__call__), so if a method is called, 
    #a NoData object is first returned and then called.  This works for operators,
    #so e.g. NoData() + 5 will:
    # - call NoData().__coerce__, which returns a (NoData, NoData) tuple
    # - call __getattr__, which returns a NoData object
    # - call the returned NoData object with args (self, NoData)
    # - this call (i.e. __call__) returns a NoData object   

    #For attribute accesses NoData will be returned, and that's it.

    #print name #(uncomment this line for debugging purposes i.e. to see that attribute was accessed/method was called)
    return self
jcdude
  • 2,943
  • 2
  • 17
  • 14
2

If it's just pow() giving you headaches, you can easily redefine it to return NaN under whatever circumstances you like.

def pow(x, y):
    return x ** y if x == x else float("NaN")

If NaN can be used as an exponent you'd also want to check for that; this raises a ValueError exception except when the base is 1 (apparently on the theory that 1 to any power, even one that's not a number, is 1).

(And of course pow() actually takes three operands, the third optional, which omission I'll leave as an exercise...)

Unfortunately the ** operator has the same behavior, and there's no way to redefine that for built-in numeric types. A possibility to catch this is to write a subclass of float that implements __pow__() and __rpow__() and use that class for your NaN values.

Python doesn't seem to provide access to any flags set by calculations; even if it did, it's something you'd have to check after each individual operation.

In fact, on further consideration, I think the best solution might be to simply use an instance of a dummy class for missing values. Python will choke on any operation you try to do with these values, raising an exception, and you can catch the exception and return a default value or whatever. There's no reason to proceed with the rest of the calculation if a needed value is missing, so an exception should be fine.

kindall
  • 178,883
  • 35
  • 278
  • 309
  • I don't see how that works. `NaN != NaN` so your `if` is always going to be true. – Duncan Apr 05 '12 at 19:18
  • Just replace `x != NaN` with `x == x`. – max Apr 05 '12 at 19:19
  • And I'm not sure; maybe `pow` is the only one, maybe it's not... I guess using `NaN` for missing data, neat as it sounds, is not really practical... :( – max Apr 05 '12 at 19:20
  • Good call, forgot about that behavior of `NaN`. – kindall Apr 05 '12 at 19:21
  • This doesn't work -- probably because x!=NaN will always evaluate to True. (`nan != nan` according to the IEEE standard). nan does propagate as long as the exponent is not 0...apparently the library takes the approach that x**0=1 no matter what x is... The way that I usually check for nan's is using numpy.isnan(x). – mgilson Apr 05 '12 at 19:22
  • Don't forget that `pow` takes an optional third argument! – Gareth Rees Apr 05 '12 at 19:22
2

Why using NaN that already has another semantic instead of using an instance of a class MissingData defined by yourself?

Defining operations on MissingData instances to get propagation should be easy...

6502
  • 112,025
  • 15
  • 165
  • 265
  • I can't believe I didn't think of this. Now with ABC, it won't even be that hard to define all the arithmetic operations, right? – max Apr 05 '12 at 19:27
  • Or as I suggested in my just-now edit to my own answer, don't even implement any operations on the `MissingData` class. Just let Python raise whatever exception when you try to use one of those objects in a calculation, catch it, and provide the default value. – kindall Apr 05 '12 at 19:34
  • I actually want the operations on MissingValue because an exception would have to be caught at every intermediate calculation, which is a bit too much work. It's far better to simply let the MissingValue propagate, and then have MissingValue populate the resulting dataset. – max Apr 05 '12 at 19:40
  • Yes, I was assuming that the calculations happen in a block or could easily be arranged to do so. – kindall Apr 05 '12 at 20:05
  • Unfortunately it looks like the `pow()` function doesn't actually call the `__pow__()` special method on the class (only `x ** y` will call `x.__pow__()`). So you're probably still going to be rewriting that, and `abs()`, and a fair number of other built-in numeric functions. – kindall Apr 05 '12 at 20:07
2

To answer your question: No, there is no way to check the flags using normal floats. You can use the Decimal class, however, which provides much more control . . . but is a bit slower.

Your other option is to use an EmptyData or Null class, such as this one:

class NullType(object):
    "Null object -- any interaction returns Null"
    def _null(self, *args, **kwargs):
        return self
    __eq__ = __ne__ = __ge__ = __gt__ = __le__ = __lt__ = _null
    __add__ = __iadd__ = __radd__ = _null
    __sub__ = __isub__ = __rsub__ = _null
    __mul__ = __imul__ = __rmul__ = _null
    __div__ = __idiv__ = __rdiv__ = _null
    __mod__ = __imod__ = __rmod__ = _null
    __pow__ = __ipow__ = __rpow__ = _null
    __and__ = __iand__ = __rand__ = _null
    __xor__ = __ixor__ = __rxor__ = _null
    __or__ = __ior__ = __ror__ = _null
    __divmod__ = __rdivmod__ = _null
    __truediv__ = __itruediv__ = __rtruediv__ = _null
    __floordiv__ = __ifloordiv__ = __rfloordiv__ = _null
    __lshift__ = __ilshift__ = __rlshift__ = _null
    __rshift__ = __irshift__ = __rrshift__ = _null
    __neg__ = __pos__ = __abs__ = __invert__ = _null
    __call__ = __getattr__ = _null

    def __divmod__(self, other):
        return self, self
    __rdivmod__ = __divmod__

    if sys.version_info[:2] >= (2, 6):
        __hash__ = None
    else:
        def __hash__(yo):
            raise TypeError("unhashable type: 'Null'")

    def __new__(cls):
        return cls.null
    def __nonzero__(yo):
        return False
    def __repr__(yo):
        return '<null>'
    def __setattr__(yo, name, value):
        return None
    def __setitem___(yo, index, value):
        return None
    def __str__(yo):
        return ''
NullType.null = object.__new__(NullType)
Null = NullType()

You may want to change the __repr__ and __str__ methods. Also, be aware that Null cannot be used as a dictionary key, nor stored in a set.

Ethan Furman
  • 63,992
  • 20
  • 159
  • 237