Why does Python subprocess module run so slowly when pyTorch is using GPU?

Question

I have been trying to use the python subprocess module while training a neural network in pytorch, but I noticed that subprocess runs many times slower if I have a network initialized on a gpu. Here is an example script I used, with a very simple linear network, profiling the times using line_profiler, and looping a simple subprocess call 100 times:

import torch
import torch.nn as nn
import subprocess
from line_profiler import LineProfiler

class TestNN(nn.Module):
    def __init__(self, device):
        super(TestNN, self).__init__()
        self.fc1 = nn.Linear(5,16)
        self.device=device
        self.to(self.device)

def test_subprocess():
    device = torch.device('cuda:0')
    testNet=TestNN(device)

    for i in range(100):
        subprocess.run(["ls",  "-l"], capture_output=True)

if __name__ == '__main__':

    lprofiler =LineProfiler()
    lp_wrapper = lprofiler(test_subprocess)
    
    lp_wrapper()
    lprofiler.print_stats()

Just by moving the small network to gpu results in more than 4x slower execution of subprocess.run().

My results from line_profiler when the network is on cpu:

Total time: 1.46088 s
File: test_subprocess_gpu.py
Function: test_subprocess at line 13

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    13                                           def test_subprocess():
    14         1        172.0    172.0      0.0      device = torch.device('cpu')
    15         1        806.0    806.0      0.1      testNet=TestNN(device)
    16                                           
    17       101       1235.0     12.2      0.1      for i in range(100):
    18       100    1458671.0  14586.7     99.8          subprocess.run(["ls",  "-l"], capture_output=True)

My results when the network is initialized on GPU:

Timer unit: 1e-06 s

Total time: 8.63406 s
File: test_subprocess_gpu.py
Function: test_subprocess at line 13

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    13                                           def test_subprocess():
    14         1        174.0    174.0      0.0      device = torch.device('cuda:0')
    15         1    2084937.0 2084937.0     24.1      testNet=TestNN(device)
    16                                           
    17       101       1163.0     11.5      0.0      for i in range(100):
    18       100    6547789.0  65477.9     75.8          subprocess.run(["ls",  "-l"], capture_output=True)

Does anyone know what causes this slow-down and how to increase the speed with the network initialized on GPU? I am very puzzled why initializing a neural network on GPU will have any affect on the speed of subprocess.run(). Any help is greatly appreciated!

It takes more time to send the data to the GPU than to leave it on the CPU, is that not expected behavior? — Ivan, Jul 15 '21 at 22:30
It makes sense that the model initialization takes longer for GPU, but why does subprocess.run() take longer? Shouldn't subprocess.run() be occuring on the cpu regardless of where the model is initialized? — David Wang, Jul 15 '21 at 22:36
I don't believe the allocation on the GPU is happening asynchronously, hence the difference in times. — Ivan, Jul 16 '21 at 05:43
Sorry could you elaborate on what you mean by that? What seems strange to me is after the model has been initialized, all future calls to subprocess.run() get slowed down. What should I do if I just want subprocess to run on CPU at the normal speeds regardless of where my model gets initialized? — David Wang, Jul 16 '21 at 17:02

Why does Python subprocess module run so slowly when pyTorch is using GPU?

0 Answers0