I’ve heard that gpus can only compute simple instructions but do them in a parallel matter, which makes them suited for machine learning.
What happens if a pytorch tensor is in gpu but the type of computation I want to perform is not included in the gpu’s instruction set? Does the data in vram travel to the cpu for that specific computation?