Good morning all,
I am kind of newbie with cuda/pyCuda, so probably this will have an easy solution employing some mechanism that I don't know....
I am employing pycuda to operate over pairs of values: I subtract the smallest from the biggest and then perform some time-consuming operations. As it must be repeated many times, it is well suited for GPUs.
However, most of the times the result of the substraction is the same. Then, performing the time-consuming operations make no sense. what I do in the non-GPU version of my code is something like:
myFunction(A,B):
index=A-B
try:
value = myDictionary[index]
except:
value = expensiveOperation(index)
myDictionary[index] = value
return value
As accessing the dictionary is much faster than expensiveOperation, and the value is found most of the times, I obtain a significant time gain.
When porting this to GPUs, I can call to myFunction(A,B) with a high degree of parallelism, which is great. However, I don't know how could I employ this dictionary mechanism -or a similar one- to avoid redundant operations.
any thoughts on this?
Thanks for your help
edit: I would like to know, is it possible to store the dictionary on the GPU, or should I copy it every time? If it's on the GPU, can it be accessed/edited by several cores at the same time? How should I implement it?