9

Slowly transitioning from Matlab to Python...

I have this list of the form

list1 = [[1, 2, nan], [3, 7, 8], [1, 1, 1], [10, -1, nan]] 

and another list with the same number of items

list2 = [1, 2, 3, 4]

I'm trying to extract the elements of list1 not containing any nan values, and the corresponding elements in list2 i.e. the result should be:

list1_clean = [[3, 7, 8], [1, 1, 1]]
list2_clean = [2, 3]

In Matlab this is easily done with logical indexing.

Here I get the feeling a list comprehension of some form will do the trick, but I'm stuck at:

list1_clean = [x for x in list1 if not any(isnan(x))]

which obviously is of no use for list2.

Alternatively, the following attempt at logical indexing does not work ("indices must be integers, not lists")

idx = [any(isnan(x)) for x in list1]
list1_clean = list1[idx]
list2_clean = list2[idx]

I'm certain it's painfully trivial, but I can't figure it out, help appreciated !

IMK
  • 195
  • 2
  • 7
  • To solve your ("indices must be integers, not lists") error, use `for x, y, in data`. You have a list inside a list so you must account for all lists. – Josh Jun 19 '13 at 15:53
  • You might want to use [numpy](http://numpy.scipy.org/) as its arrays are closer to what you might expect coming from Matlab (and are also more performant if you're doing heavy number crunching a la Matlab). – BrenBarn Jun 19 '13 at 17:03

3 Answers3

6

You can use zip.

zip returns the items on the same index from the iterables passed to it.

>>> from math import isnan
>>> list1 = [[1, 2, 'nan'], [3, 7, 8], [1, 1, 1], [10, -1,'nan']]
>>> list2 = [1, 2, 3, 4]
>>> out = [(x,y)  for x,y in zip(list1,list2) 
                                         if not any(isnan(float(z)) for z in x)]

>>> out
[([3, 7, 8], 2), ([1, 1, 1], 3)]

Now unzip out to get the required output:

>>> list1_clean, list2_clean = map(list, zip(*out))
>>> list1_clean
[[3, 7, 8], [1, 1, 1]]
>>> list2_clean
[2, 3]

help on zip:

>>> print zip.__doc__
zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]

Return a list of tuples, where each tuple contains the i-th element
from each of the argument sequences.  The returned list is truncated
in length to the length of the shortest argument sequence.

You can use itertools.izip if you want a memory efficient solution as it returns an iterator.

Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • Thanks! One question though is why do any(isnan(float(z)) for z in x) instead of any(isnan(x)) -another poster further down uses a similar construct which seems redundant (and less readable) to me. Both work but any good reason for the longer one ? – IMK Jun 19 '13 at 16:37
  • @IMK you can also use `not any(map(isnan, x))` is you've defined `nan` in some variable, as I used a string `'nan'` so mapping to `float` was requires otherwise `isnan` would have raised an error. – Ashwini Chaudhary Jun 19 '13 at 16:46
2

You can simply do this:

ans = [(x,y) for x,y in zip(list1,list2) if all(~isnan(x))]

#[(array([ 3.,  7.,  8.]), 2), (array([ 1.,  1.,  1.]), 3)]

From where you can extract each value doing:

l1, l2 = zip(*ans) 

#l1 = (array([ 3.,  7.,  8.]), array([ 1.,  1.,  1.]))
#l2 = (2,3)

Using izip from itertools module is recommended, it uses iterators which can save a huge amount of memory depending on your problem.

Instead of ~ you can use numpy.logical_not(), which may be more readable.

Welcome to Python!

Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
  • Thanks, I voted the earlier answer as the answer, it's the same solution. Noted re. izip. Glad I switched to Python, but still have some work to do. – IMK Jun 19 '13 at 16:39
0

This should work. We check if a number is NaN or not using math.isnan.

We insert an element into list1_clean and list2_clean if none of the elements in the original list are NaN. To check for this, we use the any function which returns True if any element of the iterable is True.

>>> list1 = [[1, 2, float('NaN')], [3, 7, 8], [1, 1, 1], [10, -1, float('NaN')]]
>>> list2 = [1, 2, 3, 4]
>>> from math import isnan
>>> list1_clean = [elem for elem in list1 if not any([isnan(element) for element in elem])]
>>> list1_clean
[[3, 7, 8], [1, 1, 1]]
>>> list2_clean = [list2[index] for index, elem in enumerate(list1) if not any([isnan(element) for element in elem])]
>>> list2_clean
[2, 3]

To make it smaller and avoid the usage of zip you could do,

>>> cleanList = [(elem, list2[index]) for index, elem in enumerate(list1) if not any([isnan(element) for element in elem])]
>>> cleanList
[([3, 7, 8], 2), ([1, 1, 1], 3)]
>>> list1_clean = [elem[0] for elem in cleanList]
>>> list2_clean = [elem[1] for elem in cleanList]

any function ->

any(...)
    any(iterable) -> bool

    Return True if bool(x) is True for any x in the iterable.

isnan function ->

isnan(...)
    isnan(x) -> bool

    Check if float x is not a number (NaN).
Sukrit Kalra
  • 33,167
  • 7
  • 69
  • 71