2

I have a general question about how to design my application. I have read the Cuda document, but still don't know what I should look into. Really appreciate it if someone could shed a light on it.

I want to do some real time analytics about stocks, say 100 stocks. And I have real time market data feed which will stream with updated market price. What I want to do are:

  1. pre-allocate memory black for each stock on the cuda card, and keep the memory during the day time.

  2. when new data coming in, directly update the corresponding memory on Cuda card.

  3. After updating, it issue signal or trigger event to start analytical calculation.

  4. When calculation is done, write the result back to CPU memory.

Here are my questions:

  1. what's the most efficient way to stream data from CPU memory to GPU memory? Because I want it in real time, so copying memory snapshot from CPU to GPU every second is not acceptable.

  2. I may need to allocate memory block for 100 stocks both on CPU and GPU. How to mapping the CPU memory cell to each GPU memory cell?

  3. How to trigger the analytics calculation when the new data arrive on Cuda card?

I am using a Tesla C1060 with Cuda 3.2 on Windows XP.

Thank you very much for any suggestion.

Prix
  • 19,417
  • 15
  • 73
  • 132
wyizhang
  • 21
  • 2
  • 2
    You are most likely approaching it the wrong way. Cuda and GPU is good at parallel computing. You need to have lots of independent calculations that must run in parallel in order for Cuda to be useful. Cuda provides APIs to transfer from RAM to VRAM. But GPU is good at only calculation, not transferring. So if a lot of calculation needs to be done on a static set of data, performance gain is convincing. If you need to transfer a lot of data back and forth, Cuda isn't going to perform well. – He Shiming Apr 24 '12 at 14:50
  • Shiming, Thanks for you reply. For my case, when new data coming, same computation will be re-running on the new data. So, that's why I wonder is there a way to stream directly from one CPU memory (or NIC) to Cuda Card. – wyizhang Apr 24 '12 at 18:35
  • It doesn't matter if the calculations are the same. It's not the same as SIMD. What matters is that transferring from RAM and VRAM is a lot of overhead (though it's faster than RAM to RAM). Ever remember how a game works? It loads texture into VRAM, and later on graphics presentation is based solely on VRAM content. If a game transfers texture at play time, it'll be noticeably slow. – He Shiming Apr 24 '12 at 23:27

1 Answers1

2

There is nothing unusual in your requirements.

You can keep information in GPU memory as long as your application is running, and do small updates to keep the data in sync with what you have on the CPU. You can allocate your GPU memory with cudaMalloc() and use cudaMemcpy() to write updated data into sections of the allocated memory. Or, you can hold data in a Thrust structure, such as a thrust::device_vector. When you update the device_vector, CUDA memory copies are done in the background.

After you have updated the data, you simply rerun your kernel(s) to get updated results for your calculation.

Could you expand on question (2)?

Roger Dahl
  • 15,132
  • 8
  • 62
  • 82
  • Roger, thank you for your comment. More details: in my thought, I would like to preallocate 100 memory structure for 100 stocks. New market data are coming in sequentially, say around 1000 new data per second. Is there threadpool or some mechanism that I can re-run the calculation on updated cells only? Thank you. – wyizhang Apr 25 '12 at 01:37
  • You can write your kernels in such a way that they run calculations only on the updated data. However, I was picturing a system where there were dependencies between all of your data, so that you would want to rerun your calculations on your entire dataset after doing small updates. It sounds like that's not the case. If so, you may not end up with enough work to be done after each update to make it possible to run the task any faster on the GPU than on the CPU. – Roger Dahl Apr 25 '12 at 01:50