6

This question has two parts (maybe one solution?):

Sample vectors from a sparse matrix: Is there an easy way to sample vectors from a sparse matrix? When I'm trying to sample lines using random.sample I get an TypeError: sparse matrix length is ambiguous.

from random import sample
import numpy as np
from scipy.sparse import lil_matrix
K = 2
m = [[1,2],[0,4],[5,0],[0,8]]
sample(m,K)    #works OK
mm = np.array(m)
sample(m,K)    #works OK
sm = lil_matrix(m)
sample(sm,K)   #throws exception TypeError: sparse matrix length is ambiguous.

My current solution is to sample from the number of rows in the matrix, then use getrow(),, something like:

indxSampls = sample(range(sm.shape[0]), k)
sampledRows = []
for i in indxSampls:
    sampledRows+=[sm.getrow(i)]

Any other efficient/elegant ideas? the dense matrix size is 1000x30000 and could be larger.

Constructing a sparse matrix from a list of sparse vectors: Now imagine I have the list of sampled vectors sampledRows, how can I convert it to a sparse matrix without densify it, convert it to list of lists and then convet it to lil_matrix?

ScienceFriction
  • 1,538
  • 2
  • 18
  • 29

2 Answers2

3

Try

sm[np.random.sample(sm.shape[0], K, replace=False), :]

This gets you out an LIL-format matrix with just K of the rows (in the order determined by the random.sample). I'm not sure it's super-fast, but it can't really be worse than manually accessing row by row like you're currently doing, and probably preallocates the results.

Danica
  • 28,423
  • 6
  • 90
  • 122
  • it doesn't really work as it returns a list of lists in various length and not sparse (/not sparse) vectors. e.g. sm.data[sample(xrange(sm.shape[0]), 2)] returns array([[1, 2], [8]], dtype=object) – ScienceFriction Mar 24 '12 at 21:57
  • @ScienceFriction Whoops, you're right: I was testing on a sample where the rows all had entries. I've changed the answer to something similar that actually gets you out a sparse matrix in one step. – Danica Mar 24 '12 at 22:00
  • + I was not familiar with xrange() which appears to be very useful :) – ScienceFriction Mar 24 '12 at 22:17
  • 2
    `TypeError: random_sample() takes at most 1 positional argument (2 given)` Perhaps this worked in the past but with modern versions of numpy `np.random.sample` is an alias to `numpy.random.random_sample` which only takes one argument `size` and spits out an array of random numbers. – mbecker Jul 20 '21 at 19:59
1

The accepted answer to this question is outdated and no longer works. With newer versions of numpy, you should use np.random.choice in place of np.random.sample, e.g.:

sm[np.random.choice(sm.shape[0], K, replace=False), :]

as opposed to:

sm[np.random.sample(sm.shape[0], K, replace=False), :]
primaj
  • 93
  • 1
  • 3