I have 4 NVIDIA 1080 GPUs (11GB each), 128GB RAM , and I am using a 1600w EVGA supernova P2 power supply in my lab. I am new to deep learning. I want to get a sense of what is normal behaviour during training in terms of the hardware.
I have 70000 medical images that are 256x256x3. I am doing end to end training with AlexNet.
If I set the batch size to anything more than 18 using 3 of my GPUs the computer powers down and then restarts. GPU burn works fine on all GPUs and if I use batches of 4-8 I can use all 4 GPUs. Despite all this, the temperature of the GPUs sticks at 70-75 with no more than 60% utilisation on each of the 3 GPUs.
Is this normal - I would have thought I could train batches of more generous proportions with this hardware.
Thanks.