1

In my program I have an array with the size of multiple million entries like this:

arr=[(1,0.5), (4,0.2), (321, 0.01), (2, 0.042), (1, 0.01), ...]

I could instead make two arrays with the same order (instead of an array with touples) if that helps.

For sorting this array I know I can use radix sort so it has this structure:

arr_sorted = [(1,0.5), (1,0.01), (2,0.42), ...]

Now I want to sum over all the values from the array that have the key 1. Then all that have the key 2 etc. That should be written into a new array like this:

arr_summed = [(1, 0.51), (2,0.42), ...]

Obviously this array would be much smaller, although still on the order of 100000 Entrys. Now my question is: What's the best parallel approach to my problem in CUDA? I am using NumbaPro.

Edit for clarity

I would have two arrays instead of a list of tuples like this:

keys = [1, 2, 5, 2, 6, 4, 4, 65, 3215, 1, .....]
values = [0.1, 0.4, 0.123, 0.01, 0.23, 0.1, 0.1, 0.4 ...]

They are initially numpy arrays that get copied to the device.

What I want is to reduce them by key and if possible set missing key values (for example if three doesn't appear in the array) to zero.

So I would want it go become:

keys = [1, 2, 3, 4, 5, 6, 7, 8, ...]
values = [0.11, 0.41, 0, 0.2, ...] # <- Summed by key

I know how big the final array will be beforehand.

talonmies
  • 70,661
  • 34
  • 192
  • 269
Escapado
  • 117
  • 2
  • 12
  • Do you have entries for each value? If not, do you want to keep track of which values there are no entries for or do you want to set those to a default value, like 0.0? Also, is it possible for an entry to have a zero or negative value? – Roger Dahl Nov 03 '15 at 16:24
  • You are talking about arrays in the question, but all the code is showing lists of tuples. Which is it? – talonmies Nov 03 '15 at 19:09
  • I should have been more precise. I don't have entries for each value and if possible I would like to set those that are not present to zero. – Escapado Nov 04 '15 at 08:41
  • My notation was not good. I have numpy arrays that get that get copied to the device initially. So I would have two arrays instead of a list of touples. But they would still need to be sorted in the same manner. I found that there is a library called thrust that does exactly what I want but it's not available for numba and the function is called reduce_by_key. I'll edit the question for more clarity. – Escapado Nov 04 '15 at 08:49
  • Details matter. Some actual python code and representative data would greatly improve the quality of this question – talonmies Nov 05 '15 at 07:07

1 Answers1

-1

I don't know Numba, but in simple Python:

arr=[(1,0.5), (4,0.2), (321, 0.01), (2, 0.042), (1, 0.01), ...]
res = [0.0] * (indexmax + 1)
for k, v in arr:
   res[k] += v
Tunaki
  • 132,869
  • 46
  • 340
  • 423
  • Not an answer. the question is specific to numbapro and not simple python. unless you are sure it is same in both, this does not hold. – phoenix Jan 17 '16 at 13:15