3

I'm trying to reduce the number of calculations I do based on search distance. I have N nodes and an [NxN] boolean mask that tells me what nodes are within X distance of the other nodes with T true values.

I also have [Nx(d)] data for each node, where (d) can be (1), (3), or (3x3). I want the "sparse" format which is a [Tx(d)] array so I can do vectorized calculations along the 0 axis. Right now I do this:

sparseData=data.repeat(data.shape[0],axis=0).reshape(np.concatenate(([data.shape[0],data.shape])))[mask]

Which works, but causes memory errors if N is too big, due to the [NxNx(d)] array I'm creating with .repeat Is there a way to broadcast this? If I do this:

data[None,...][mask]

It doesn't work, but it seems like there has to be a more efficient way to do this.

Daniel F
  • 13,620
  • 2
  • 29
  • 55

1 Answers1

4

Instead of repeating the data you can make a view with numpy.broadcast_to:

sparseData = np.broadcast_to(data, (data.shape[0],) + data.shape)[mask]

However, even easier would be to select the rows of data based on index:

I, J = np.nonzero(mask)
sparseData = data[I]  # could also use J
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
user7138814
  • 1,991
  • 9
  • 11
  • Thanks, I just figured out an answer based on `np.where(mask)[1]` which is equivalent to your second answer. Sometimes just writing out the question brings answers to mind. – Daniel F Jan 19 '17 at 10:47