I´m newbie in the forum and I hope that you will help me with my question. Recently, I´ve developed an application in which I´ve used CUDA streams with the aim of overlapping computation and data transfers. I've executed this application on a GPU Nvidia (Maxwell architecture). I've observed with the Visual Profiler tool that some data transfers HostToDevice occur at the same time. The Maxwell GPUs only have 2 Copy engines. One copy engine is for the HostToDevice transfers and the other copy engine is for the DeviceToHost transfers, right?. With this in mind, I think that two HostToDevice transfers can´t occur at the same time. However, I´ve observed with Visual Profiler that this behaviour appears in my application. So, my question is: in this architecture, is it possible that two HostToDevice (or DeviceToHost) data transfers might occur at the same time?.
Thank you so much.