I am writing an application and eventually it comes to well parallelisable part:
two dimensional float initialData and result arrays
for each cell (a, b) in result array:
for each cell (i, j) in initialData:
result(a, b) += someComputation(initialData(i, j), a, b, i, j, some global data...);
Some more details about algorithm:
- I'd like to make the first loop's iterations to run concurrently (perhaps there is better approach?)
- Initial data is accessed in read-only way
- someComputation is fairly simple, it involves multiplication, addition, cosine computing, so it could be accomplished by GPU, however, it needs the indexes of elements it is currently working on
- Arrays won't exceed ~4000 in any dimension
Library properties:
- Program is going to be written in C# (with WPF), so it would be nice if it (already) had easy-to-use .NET bindings
- If there is no GPU found, algorithm should run on CPU
- Program is going to be Windows-only and Windows XP support is highly preferable.
- Algorithm can be rewritten in OpenCL, however, I believe it is not as widely supported as pixel shaders. But, if there are no alternatives, OpenCL would be fine. (AFAIK CUDA runs only on nVidia GPU's and OpenCL covers both nVidia's and AMD's GPU's)
I have tried to look at Microsoft Accelerator library, but I haven't found a way to pass in array indexes. Any help would be apprectiated and excuse me for my english.