I am running a UNet with PyTorch on medical imaging data with a bunch of transformations and augmentations in my preprocessing. However, after digging into the different preprocessing packages like Torchio and MONAI, I noticed that most of the functions, even when they take Tensors as IO, are running things on CPU. The functions either straight up take numpy arrays as input or call .numpy() on the tensors.
The problem is that my data is composed of 3D images of dimension 91x109x91 that I resize in 96x128x96 so they are pretty big. Hence, running transformations and augmentations on CPU is pretty inefficient I think.
First, it makes my program CPU bound because it takes more time to transform my images than running them through the model (I timed it many times ). Secondly, I checked the GPU usage and it's oscillating between pretty much 0% and 100% at each batch so, it's clearly limited by the CPU. I would like to speed it up if it's possible.
My question is: Why are these packages not using the GPUs? They could at least have hybrid functions taking either a numpy array or a Tensor as input as a lot of numpy functions are available in Torch as well. Is there a good reason to stick to the CPU rather than speeding up the preprocessing by loading the images on GPU at the beginning of the preprocessing?
I translated a simple normalization function to work on GPU and compare the running time between the GPU and CPU version and even on a laptop (NVidia M2000M) the function was 3 to 4 times faster on GPU.
On an ML discord, someone mentioned that GPU-based functions might not give deterministic results and that's why it might not be a good idea but I don't know if that's actually the case.
My preprocessing includes resizing, intensity clamping, z-scoring, intensity rescaling, and then I have some augmentations like random histogram shift/elastic transform/affine transform/bias field.