4

I have a very big sparse csc_matrix x. I want to do elementwise exp() on it. Basically what I want is to get the same result as I would have got with numpy.exp(x.toarray()). But I can't do that(my memory won't allow me to convert the sparse matrix into an array). Is there any way out? Thanks in advance!

Bishwajit Purkaystha
  • 1,975
  • 7
  • 22
  • 30
  • 1
    If you can't hold the input in dense format, you're not going to be able to hold the output; the output won't be sparse, since e^0=1. – user2357112 Feb 23 '17 at 06:46

4 Answers4

8

If you don't have the memory to hold x.toarray(), you don't have the memory to hold the output you're asking for. The output won't be sparse; in fact, unless your input has negative infinities in it, the output probably won't have a single 0.

It'd probably be better to compute exp(x)-1, which is as simple as

x.expm1()
user2357112
  • 260,549
  • 28
  • 431
  • 505
  • Yes, you're right. But, how can I do it only for non zero values then? – Bishwajit Purkaystha Feb 23 '17 at 06:57
  • 1
    What's that? Sparse matrices contain all the numpy functions that happen to map zero to zero as members? There's a feature I wouldn't have expected! – Paul Panzer Feb 23 '17 at 06:58
  • Yes, `scipy/sparse/data.py` has a block of code that `# Add the numpy unary ufuncs for which func(0) = 0 to _data_matrix.`. The key is being able to access the `.data` attribute, and make a new matrix with a `_with_data` method. I wasn't aware of that either. – hpaulj Feb 23 '17 at 07:14
4

To change non-zero elements, maybe this would work for you:

x = some big sparse matrix
np.exp( x.data, out=x.data ) # ask np.exp() to store results in existing x.data

presumably slower:

# above seems more efficient (no new memory alloc).
x.data = np.exp( x.data )

I've been wrestling with how to get an element-wise log2() of each non-zero array element. I ended up doing smth like:

np.log2( x.data, out=x.data )

The following two techniques seem like exactly what I was looking for. My matrix is sparse but it still plenty of non-zero elements.

Credit to @DSM here for the idea of directly changing x.data, I think that is a superb insight about sparse matrices.

Credit to @Mike Müller for the idea of using "out" as itself. In the same thread, @kmario23 points out an important caveat about promoting .data to floats (inputs could be int or smth) so it is compatible with the .exp() or whatever function, I would want to do that if I was writing smth for general use.

note: I'm just starting to learn about sparse matrices, so would like to know if this is a bad idea for reason(s) I'm not seeing. Please do let me know if I'm on thin ice with this.

Normally I wouldn't mess with private attributes, but .data shows up pretty clearly in the attributes documentation for the various sparse matrices I've looked at.

jgreve
  • 1,225
  • 12
  • 17
3

If you want to do something on nonzeros only: the data attribute is writable at least in some representations including csr and csc. Some representations allow for duplicate entries, so make sure you are acting on a "normalised" form.

Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
0

If you just wan to apply some element-wise function only on the nonzero elements, and ignore all other compressed elements (essentially a masked operation), you could do:

y = x.copy()
y.data = np.exp(y.data)

Other element-wise function could work as well.

Do note that this is a really masked operation, instead of treating all other entries as 0. Because applying a function to 0 may return a nonzero.

Tong Zhou
  • 566
  • 5
  • 7