3

Ok, in trying to answer this question I came across something very strange.

matrix = np.zeros(10000)
matrix[np.random.choice(10000, 100)] = np.random.rand(100)
matrix = matrix.reshape(10, 1000)

from scipy.sparse import lil_matrix
l = lil_matrix(matrix.T)
l.rows

Out: array([[], [], [], ..., [], [], []], dtype=object)

Ok, so I want to know which rows have data, so I tried:

np.any(l.rows)

Out: [8]

. . . what?

out = np.any(l.rows)
type(out)

Out: list

It's a list. With an 8 in it. Which seems . . . random. What is going on?

After playing around it seems it returns the first object in the array that's not [].

np.random.seed(9)
matrix = np.zeros(10000)
matrix[np.random.choice(10000, 100)] = np.random.rand(100)
matrix = matrix.reshape(10, 1000)

from scipy.sparse import lil_matrix
l = lil_matrix(matrix.T)
l.rows

Out: array([[], [], [5], ..., [], [], []], dtype=object)

np.any(l.rows)
Out: [5]

But considering np.any is only supposed to output boolean or np.array of boolean, this is a very strange result. Does anyone know why this happens?

hpaulj
  • 221,503
  • 14
  • 230
  • 353
Daniel F
  • 13,620
  • 2
  • 29
  • 55
  • Addendum: my google-fu is weak, but I can't find any error report for this on github (which it should probably have since indexing by the result will cause some *very* unitntended outcomes). Not sure how to raise it myself though. – Daniel F Sep 11 '18 at 07:12
  • Isn't `[]` "auto-convertible" to false, and `8` to true? In other words, if you do `if np.any(l.rows):`, does this work "as expected" even though the underlying value is not true/false? – Lasse V. Karlsen Sep 11 '18 at 07:33
  • @LasseVågsætherKarlsen Yes, when passed as a boolean they are interpreted as expected. The problem is in this case if they were used to index, for example `l[np.any(l.rows)]` - granted that's an extremely naive application that I never should have thought would work in the first place. – Daniel F Sep 11 '18 at 07:42
  • No, I meant that if `np.any` returns the first "non-null" element, instead of a bool, is a "non-null" element auto-convertible to "true" if you use it in a boolean context? Meaning: `if np.any(l.rows): x`, will this execute `x` if `np.any(l.rows)` returns `8`? – Lasse V. Karlsen Sep 11 '18 at 10:09
  • Yes, you can test it yourself. – Daniel F Sep 11 '18 at 10:10
  • I don't have Python or numpy installed, my question was more alluding to "is this really a problem at all?" – Lasse V. Karlsen Sep 11 '18 at 10:11
  • @LasseVågsætherKarlsen Ahh, sorry. Forgot the `sparse-matrix` tag would ping some folks from other langauges. Yes it is a problem if you expect a boolean and use it for boolean indexing. If the `object` returned can be interpeted by the indexer as some other index than `True` (such as, for example, a list of integers), the resulting slice will point to the wrong location. – Daniel F Sep 11 '18 at 10:18
  • `l.rows` is a object array of lists. `np.any(x)` does `np.logical_or.reduce(x)`, which apparently for object arrays is evaluated as `x[0] or x[1] or x[2] or x[3] ...`. Python `or` short circuits returning the first True case. – hpaulj Sep 11 '18 at 16:02

1 Answers1

0

I found it. Apparently it's been on the Easy Fix list since 2014, but finally has someone working on it since last week.

Should have figured I'm not the first dummy to try something like that.

Also, the correct usage in this case would be:

l[l.rows.astype(bool)]
Out: 
<97x10 sparse matrix of type '<class 'numpy.float64'>'
    with 100 stored elements in LInked List format>
Daniel F
  • 13,620
  • 2
  • 29
  • 55