-3

I have a bash file (generateData.sh) that contains hundred of python scripts such as :

python create_video.py input output --param1 --param2 --param3

Each python script processes a different input video and uses in the process one of the two available GPU on my machine (for some computer vision tasks).

I tried to parallelize the bash file using GNU parallel (or xargs or &) with :

cat generateData.sh | parallel

This allows me to parallelize 48 python scripts. However, because of the limited space on GPU, only 10 of them correctly finish with a good output video. The other input videos are not handle at all, probably because it encountered some cuda out-of-memory errors.

I would like GPU parallel to wait that some jobs finish to have some free space on GPUs. Otherwise, GPU parallel will only process correctly a sub-part of my bash file.

Thanks for you answers !

1 Answers1

1

Something like this:

cat python-lines.sh |
  parallel -j48 CUDA_VISIBLE_DEVICES='$(({%} % 2))' {=uq=}

to run 24 jobs per GPU on 2 CUDA devices.

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • Thanks for your anwer ! The thing is that I want to put more than 2 jobs on each GPU. For example I could put 5 videos on each GPU, but it depends on the size of the video. – Florian Delcour Mar 08 '23 at 14:13
  • @FlorianDelcour How du to force the computation onto GPU 2? How do you determine whether there is capacity on GPU 2 to run the next job on that? – Ole Tange Mar 09 '23 at 09:55
  • Using nvidia-smi command, I can check the memory-usage of GPUs. For example a 720p video takes 1002MiB of space on the GPU so I can parallelize ~24 videos on 1 GPU of 24Go RAM. However, your solution looks quite nice but it doesn't work with my bash file. I got "No such file directory" for every lines of the bash file. – Florian Delcour Mar 09 '23 at 10:43
  • @FlorianDelcour Try: `{=uq=}` - see above (requires version 20190722 or later) – Ole Tange Mar 10 '23 at 13:12