0

I am trying to download 100 files using a script

I dont want at any point of time not more than 4 downloads are happening.

So i have create a folder /home/user/file_limit. In the script it creates a file here before the download and after the download is complete it will delete it.

The script will check the number of files in the folder is less than 4 then only it will allow to create a file in the folder /home/user/file_limit

I am running a script like this

    today=`date +%Y-%m-%d-%H_%M_%S_%N`;
    while true
    do
        sleep 1
        # The below command will find number of files in the folder /home/user/file_limit
        lines=$(find /home/user/file_limit -iname 'download_*' -type f| wc -l)
        if [ $lines -lt 5 ]; then
            echo "Create file"
            touch "/home/user/file_limit/download_${today}"
            break;
        else
            echo "Number of files equals 4"
        fi
    done

    #After this some downloading happens and once the downloading is complete

    rm "/home/user/file_limit/download_${today}"

The problem i am facing is when 100 such scripts are running. Eg when the number of files in the folder are less than 4, then many touch "/home/user/file_limit/download_${today}" gets executed simultaneously and all of them creates files. So the total number of files become more than 4 which i dont want because more downloads cause my system get slower.

How to ensure there is a delay between each script for checking the lines=$(find /home/user/file_limit -iname 'download_*' -type f| wc -l) so that only one touch command get executed.

Or HOw to ensure the lines=$(find /home/user/file_limit -iname 'download_*' -type f| wc -l) command is checked by each script in a queue. No two scripts can check it at the same time.

Santhosh
  • 9,965
  • 20
  • 103
  • 243
  • 1
    This is a bit of a XY problem. The linked duplicate answers X (the problem you actually wanted to solve). This questions is about Y (solving problems in your approach). Just as an extra, here's some information on the Y part: ¶ Adding a delay won't solve the problem. You need a *lock*, *mutex*, or *semaphore* to ensure that the check and creation of files is executed atomically. That is, if one process executes the part *"check and create files"* other processes cannot execute this part too. GNU parallel comes with the utility `sem` for that. – Socowi May 13 '20 at 06:33
  • I dont want the X way bcause i have to do more things during download not just wget. So can you explain in detail about Y way – Santhosh May 13 '20 at 08:59
  • Also for using parallel you have to know the list of items to run parallel before only. here a new script instance can be started and it will also be waiting to touch a file when number of files in the directory are less than 4 – Santhosh May 13 '20 at 09:15
  • Aside from [bugs](https://unix.stackexchange.com/q/511004/187122) `parallel` is perfectly capable of working on a list of jobs without knowing the full list in advance. Also, you can execute multiple commands in `parallel` just as you would in your shell, for instance `seq 3 | parallel 'echo 1st step for {}; echo 2nd step for {}'` executes the commands `echo 1st` **and** `echo 2nd` for each input. Anyways, I added an answer for what you asked (even though it might not be the best approach compared to using `parallel` directly). – Socowi May 13 '20 at 12:12

2 Answers2

0

How to ensure there is a delay between each script for checking the lines=$(find ... | wc -l) so that only one touch command get executed

Adding a delay won't solve the problem. You need a lock, mutex, or semaphore to ensure that the check and creation of files is executed atomically.

Locks limit the number of parallel processes to 1. Locks can be created with flock (usually pre-installed). Semaphores are generalized locks limiting the number concurrent processes to any number N. Semaphores can be created with sem (part of GNU parallel, has to be installed).

The following script allows 4 downloads in parallel. If 4 downloads are running and you start the script a 5th time then that 5th download will pause until one of the 4 running downloads finish.

#! /usr/bin/env bash

main() {
  # put your code for downloading here
}
export -f main
sem --id downloadlimit -j4 main
Socowi
  • 25,550
  • 3
  • 32
  • 54
  • In the middle of the code how to run part of the code with parallel and capture the return value from that and continue my code from there. – Santhosh May 21 '20 at 17:29
  • To do this, your script should have the structure: **(1)** first part **(2)** code from this answer with the middle part inside `main` and with `--fg` added to `sem` **(3)** The last part. ¶ Checking the exit code is a bit tricky. `sem` does not forward exit codes. You can print the exit code and read it (`read code < <(sem … 'main; echo $?')`) or write it to a file (`sem … 'main; echo $? > /tmp/exitcode'`). – Socowi May 21 '20 at 21:11
  • what does `export -f` do. – Santhosh May 22 '20 at 05:32
  • I am using `zsh` it says `-f no such option for export` – Santhosh May 22 '20 at 05:37
  • Somehow in `zsh` when i do `sem --id downloadlimit main` it says `zsh:1: command not found: main` – Santhosh May 22 '20 at 05:40
  • Asked this as a new question. Can you check it https://stackoverflow.com/questions/61948652/zsh-and-parallel-how-to-use-functions-it-says-command-not-found – Santhosh May 22 '20 at 05:53
  • 1
    @SanthoshYedidi Um, why did tag your question as [tag:bash] when you use [tag:zsh]? I explicitly added the shebang for bash to point out that this script relies on bashisms. I'm no expert in zsh. Hopefully, another user wants to help you with that. Good luck. – Socowi May 22 '20 at 07:02
  • Can i set some priority also eg: `sem --id downloadlimit --priority 100 main` and `sem --id downloadlimit --priority 500 main2` – Santhosh May 22 '20 at 14:15
  • How should the priority work? Do you want to pause already running downloads with low priority when a download with high priority wants to start? Anyways, this comment section gets out of hand. Please open a new question for things like that. – Socowi May 22 '20 at 14:31
  • I dont want to stop the existing process, but when it begins a new one it will start the one with priroity first. – Santhosh May 22 '20 at 15:10
0

My solution starts maximum MAXPARALELLJOBS number of process and waits until all of those processes are done...

Hope it helps your problem.

MAXPARALELLJOBS=4
count=0
while <not done the job>
 do
 ((count++))
 ( <download job> ) &
 [ ${count} -ge ${MAXPARALELLJOBS} ] && count=0 && wait
 done
Swifty
  • 180
  • 7