Prerequisite
This is a question is an extension of this post. So, some of the introduction of the problem will be similar to that post.
Problem
Let's say result
is a 2D array and values
is a 1D array. values
holds some values associated with each element in result
. The mapping of an element in values
to result
is stored in x_mapping
and y_mapping
. A position in result
can be associated with different values. (x,y)
pair from x_mapping
and y_mapping
is associated with results[-y,x]
. I have to find the unique count of the values grouped by associations.
An example for better clarification.
result
array:
[[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]]
values
array:
[ 1., 2., 1., 1., 5., 6., 7., 1.]
Note: Here result
arrays and values
have the same number of elements. But it might not be the case. There is no relation between the sizes at all.
x_mapping
and y_mapping
have mappings from 1D values
to 2D result
. The sizes of x_mapping
, y_mapping
and values
will be the same.
x_mapping
- [0, 1, 0, 0, 0, 0, 0, 0]
y_mapping
- [0, 3, 2, 2, 0, 3, 2, 0]
Here, 1st value(values[0]), 5th value(values[4]) and 8th value(values[7]) have x as 0 and y as 0 (x_mapping[0] and y_mappping[0]) and hence associated with result[0, 0]. If we compute the count of distinct values from this group- (1,5,1), we will have 2 as result.
@WarrenWeckesser
Let's see how [1, 3]
(x,y) pair from x_mapping
and y_mapping
contribute to results
. Since there is only one value, ie 2, associated with this particular group, the results[-3,1]
will have one as the number of distinct values associated with that cell is one.
Another example. Let's compute the value of results[-1,1]
. From mappings, since there is no value associated with the cell, the value of results[-1,1]
will be zero.
Similarly, the position [-2, 0]
in results
will have value 2.
Note that if there is no association at all then the default value for result
will be zero.
The result
after computation,
[[ 2., 0.],
[ 1., 1.],
[ 2., 0.],
[ 0., 0.]]
Current working solution
Using the answer from @Divakar, I was able to find a working solution.
x_mapping = np.array([0, 1, 0, 0, 0, 0, 0, 0])
y_mapping = np.array([0, 3, 2, 2, 0, 3, 2, 0])
values = np.array([ 1., 2., 1., 1., 5., 6., 7., 1.], dtype=np.float32)
result = np.zeros([4, 2], dtype=np.float32)
m,n = result.shape
out_dtype = result.dtype
lidx = ((-y_mapping)%m)*n + x_mapping
sidx = lidx.argsort()
idx = lidx[sidx]
val = values[sidx]
m_idx = np.flatnonzero(np.r_[True,idx[:-1] != idx[1:]])
unq_ids = idx[m_idx]
r_res = np.zeros(m_idx.size, dtype=np.float32)
for i in range(0, m_idx.shape[0]):
_next = None
arr = None
if i == m_idx.shape[0]-1:
_next = val.shape[0]
else:
_next = m_idx[i+1]
_start = m_idx[i]
if _start >= _next:
arr = val[_start]
else:
arr = val[_start:_next]
r_res[i] = np.unique(arr).size
result.flat[unq_ids] = r_res
Question
Now, the above solution takes 15ms for operating on 19943 values. I'm looking for a way to compute the result faster. Is there any more performant way to do this?
Side note
I'm using Numpy version 1.14.3 with Python 3.5.2
Edits
Thanks to @WarrenWeckesser, pointing out that I haven't explained how an element in results
is associated with (x,y)
from mappings. I have updated the post and added examples for clarity.