1

I have a relatively small section of code that deals with huge datasets which I've already parallelized using openmp and am keen to increase performance further using the GPU. The program is C++, developed under VS2015, runs exclusively on Windows and will need to support 64 bit versions from 7 upwards on as wide a variety of GPUs as is feasible. Technologies I've been looking at so far include AMP, OpenCL, HLSL, and CUDA. Questions already asked, such as this with an informative answer by Ade Miller, make me question whether AMP is the way to go although it looks like the easiest option. I'm dismissing CUDA as it limits me in terms of hardware supported, and am tending towards OpenCL while currently working my way through the following book. As such, I've the following questions;

Is OpenCL a good approach here, as other posts suggest it may also be on the way out?

If I go for OpenCL while wanting to support the widest range of GPUs, am I better off with a 1.x version of OpenCL? Reason I ask this is that the OpenCL.DLL downloaded with the latest version of the CUDA SDK is 1.9. I had to download the Intel SDK for OpenCL to get a 2.x version.

If I go with OpenCL, what do I have to distribute with my application (assuming OpenCL.DLL as a minimum) and are there any licensing issues? Are default drivers for most cards going to support OpenCL and if so which versions?

With respect to the above, am I actually better of with AMP, as it works with anything that has DirectX 11 or better?

(Apologies if the above is slightly off topic, if anyone believes that it is perhaps they could point me to a better forum to ask these questions)

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
SmacL
  • 22,555
  • 12
  • 95
  • 149
  • How big are the datasets and what kind of data do you wanna process ? – Hannes Hauptmann Aug 28 '18 at 10:16
  • The datasets run into the 10s of billions of cartesian coordinates though are significantly culled prior to hitting the loops I'm looking to optimize. The two loops essentially correspond to stacks on entries for the leaf and last branch stages of an octree, – SmacL Aug 28 '18 at 10:21
  • 1
    Since you are sticking to Windows, use DirectCompute, which a lot of Open layers end up using. – Khouri Giordano Aug 28 '18 at 10:51
  • @Khouri Giordano, thanks for this. I had a look at a few DirectCompute samples and its looks like a similar amount of implementation work to OpenCL without the potential installation and driver support issues. I'll post again once I get my solution up and running. – SmacL Aug 28 '18 at 13:48

1 Answers1

2

Is OpenCL a good approach here, as other posts suggest it may also be on the way out?

OpenCL seems to be most widely supported GPU computing platform. Supported by nVidia, AMD and Intel. Works on most mobile platforms as well. It is also large set of libraries available: ViennaCL, clBLast, clBlast, Boost-Compute and so on.

If I go for OpenCL while wanting to support the widest range of GPUs, am I better off with a 1.x version of OpenCL?

Yes, currently the safest is to stick with 1.2 - and actually it is more then enough.

All major desktop GPU vendors (Intel, AMD, nVidia) support at least OpenCL 1.2. Actually only nVidia didn't released official 2.0 support - it is still in beta stage.

Also note that some older GPUs will support OpenCL 1.2 only as well.

Artyom
  • 31,019
  • 21
  • 127
  • 215
  • Thanks for the answer. I tried building and running some sample OpenCL programs on my PC here, and while some of the simpler ones ran ok, I had quite a few system freezes which involved having to turn the PC. I also tried a number of DirectCompute/HLSL examples which proved easier to build and more stable, so for my current requirements I'm going to run with that. That said, I like the look of OpenCL and will come back to it when I've a bit more time. – SmacL Aug 30 '18 at 14:40