I am using Stable diffusion inpainting pipeline
to generate some inference results on a A100 (40 GB)
GPU. For a 512X512
image it is taking approx 3 s per image and takes about 5 GB of space on the GPU.
In order to have faster inference, I am trying to run 2 threads (2 inference scripts). However, as soon as I start them simultaneously. The inference time decreases to ~6 sec per thread with an effective time of ~3 s per image.
I am unable to understand why this is so. I still have a lot of space available (about 35 GB) on GPU and quite a big CPU ram of 32 GB.
Can someone help me in this regard?