cryptic scipy "could not convert integer scalar" error

Question

I am constructing a sparse vector using a scipy.sparse.csr_matrix like so:

csr_matrix((values, (np.zeros(len(indices)), indices)), shape = (1, max_index))

This works fine for most of my data, but occasionally I get a ValueError: could not convert integer scalar.

This reproduces the problem:

In [145]: inds

Out[145]:
array([ 827969148,  996833913, 1968345558,  898183169, 1811744124,
        2101454109,  133039182,  898183170,  919293479,  133039089])

In [146]: vals

Out[146]:
array([ 1.,  1.,  1.,  1.,  1.,  2.,  1.,  1.,  1.,  1.])

In [147]: max_index

Out[147]:
2337713000

In [143]: csr_matrix((vals, (np.zeros(10), inds)), shape = (1, max_index+1))
...

    996         fn = _sparsetools.csr_sum_duplicates
    997         M,N = self._swap(self.shape)
--> 998         fn(M, N, self.indptr, self.indices, self.data)
    999 
    1000         self.prune()  # nnz may have changed

ValueError: could not convert integer scalar

inds is a np.int64 array and vals is a np.float64 array.

The relevant part of the scipy sum_duplicates code is here.

Note that this works:

In [235]: csr_matrix(([1,1], ([0,0], [1,2])), shape = (1, 2**34))
Out[235]:

<1x17179869184 sparse matrix of type '<type 'numpy.int64'>'
    with 2 stored elements in Compressed Sparse Row format>

So the problem is not that one of the dimensions is > 2^31

Any thoughts why these values should be causing a problem?

please try using `2**31` and `2**31-1` with the same example you posted. — matiasg, Mar 20 '15 at 15:27
yes, the sample data I posted works with `2**31-1` but not 2**31 — Rok, Mar 20 '15 at 15:40

score 1 · Answer 1 · answered Mar 20 '15 at 14:30

1

Might it be that max_index > 2**31 ? Try this, just to make sure:

csr_matrix((vals, (np.zeros(10), inds/2)), shape = (1, max_index/2))

answered Mar 20 '15 at 14:30

matiasg

1,927
2
24
37

yes, this was my first thought as well -- but it works with other similar data using the same `max_index` – Rok Mar 20 '15 at 14:35
no, `scipy.sparse.csr_matrix` works fine with `max_index > 2**31` -- see edited question. – Rok Mar 20 '15 at 14:39
@Rok I actually get a different exception (using Python 2.7 + scipy 0.9.0). I can construct the matrix with `2**31-1` but not with `2**31`. What scipy version are you using? – matiasg Mar 20 '15 at 14:41
@matiasg: scipy 0.15.1 installed using continuum anaconda – Rok Mar 20 '15 at 14:44
I installed Anaconda. They are using 64 bits for indices now, as I can construct a matrix with `2**63-1` but not with `2**63`. This is unrelated to your problem, then, but it seems a bit annoying. – matiasg Mar 20 '15 at 15:21
But yet again, in your example the limit seems to be `2**31-1`. I get now the same exception as you get, with the same example you posted. But if I use `shape=(1, 2**31-1)` it works fine. – matiasg Mar 20 '15 at 15:25

score 0 · Answer 2 · answered Mar 20 '15 at 14:43

0

The max index you are giving is less than the maximum index of the rows you are supplying.

This sparse.csr_matrix((vals, (np.zeros(10), inds)), shape = (1, np.max(inds)+1)) works fine with me.

Although making a .todense() results in memory error for the large size of the matrix

answered Mar 20 '15 at 14:43

Ars3nous

136
1
1
8

well, no -- the max value in the index array is 2101454109 but `max_index` is 2337713001. When the dimension is too small, it throws a `ValueError: column index exceeds matrix dimensions` error. Though you are right that using `inds.max() +1` works. The plot thickens. – Rok Mar 20 '15 at 14:48
Oops i counted a zero less. BTW, for me anything greater than 2**32-1 does not work (your example fails). It throws a weird exception,NotImplementedError: Wrong number or type of arguments for overloaded function 'coo_tocsr'. I am using enthought student distribution,scipy version: '0.13.3' – Ars3nous Mar 20 '15 at 15:04
I guess your version is using 32-bit integers then? – Rok Mar 20 '15 at 15:38

TheIdealis · Answer 3 · 2015-08-06T15:42:25.013

Uncommenting the sum_duplicates - function will lead to other errors. But this fix: strange error when creating csr_matrix also solves your problem. You can extend the version_check to newer versions of scipy.

import scipy 
import scipy.sparse  
if scipy.__version__ in ("0.14.0", "0.14.1", "0.15.1"): 
    _get_index_dtype = scipy.sparse.sputils.get_index_dtype 
    def _my_get_index_dtype(*a, **kw): 
        kw.pop('check_contents', None) 
        return _get_index_dtype(*a, **kw) 
    scipy.sparse.compressed.get_index_dtype = _my_get_index_dtype 
    scipy.sparse.csr.get_index_dtype = _my_get_index_dtype 
    scipy.sparse.bsr.get_index_dtype = _my_get_index_dtype

cryptic scipy "could not convert integer scalar" error

3 Answers3

Linked