Multiple read from a txt file in bash (parallel processing )

Question

Here is a simple bash script for HTTP status code

while read url
    do
        urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 )
        echo "$url  $urlstatus" >> urlstatus.txt
    done < $1

I am reading URL from text file but it processes only one at a time, taking too much time, GNU parallel and xargs also process one line at time (tested)

How to process simultaneous URL for processing to improve timing? In other words threading of URL file rather than bash commands (which GNU parallel and xargs do)

 Input file is txt file and lines are separated  as
    ABC.Com
    Bcd.Com
    Any.Google.Com

Something  like this

Why not read the file and spin off a different nohup script for each URL? — Karan Shah, Jan 18 '17 at 08:10
What takes too long exactly? Please give an example. A `bash` loop reading 10,000 URLs will probably finish before your first 2-3 `curl` commands so that is not the bottleneck and not worth optimising. Just use **GNU Parallel** to run the `curl` commands. — Mark Setchell, Jan 18 '17 at 08:22
actually the problem is parallel is processing multiple commands rather than multiple url — user7423959, Jan 18 '17 at 08:25
for example--- cat abc.txt | parallel -j100 --pipe /root/bash5.sh abc.txt is processing one url at a time like normal bash script execution — user7423959, Jan 18 '17 at 08:26
@user7423959 Use `parallel` to run `curl` *in* your script, rather than using `parallel` to run your script. — chepner, Jan 18 '17 at 14:15

score 2 · Accepted Answer · answered Jan 18 '17 at 23:24

2

GNU parallel and xargs also process one line at time (tested)

Can you give an example of this? If you use -j then you should be able to run much more than one process at a time.

I would write it like this:

doit() {
    url="$1"
    urlstatus=$(curl -o /dev/null --silent --head --write-out  '%{http_code}' "${url}" --max-time 5 )
    echo "$url  $urlstatus"
}
export -f doit
cat "$1" | parallel -j0 -k doit >> urlstatus.txt

Based on the input:

Input file is txt file and lines are separated  as
ABC.Com
Bcd.Com
Any.Google.Com
Something  like this
www.google.com
pi.dk

I get the output:

Input file is txt file and lines are separated  as  000
ABC.Com  301
Bcd.Com  301
Any.Google.Com  000
Something  like this  000
www.google.com  302
pi.dk  200

Which looks about right:

000 if domain does not exist
301/302 for redirection
200 for success

answered Jan 18 '17 at 23:24

Ole Tange

31,768
5
86
104

I will test and let you know – user7423959 Jan 19 '17 at 02:18
hey i got same status code 000 ,, can you tell me how you executing your script from terminal , may it help – user7423959 Jan 19 '17 at 04:33
`cat input.txt | parallel -j0 -k doit >> urlstatus.txt;` As you can see, I also get 000 for the domains that do not exist. I am wondering, if you actually give us an extract from your input. If the 6 lines are not actually in your input file, then could you please give 10 lines from your _actual_ input file? – Ole Tange Jan 19 '17 at 07:42
i explain the whole process--- 1. i copied your bash script and saved it as bash.sh and giving execution permissions . 2. my input file is big file but i also tested on small 10 lines file---here is list www.yahoo.com ,www.google.com facebook.com amazon.com bing.com apple.com www.microsoft.com www.windows.com ,,,,,all seperated by lines and saved as top.txt 4. now then i go to terminal and type ./bash.sh top.txt 5. now it gives the result 000 in each 6. now can you assist me further where ia am wrong ,,,thanks – user7423959 Jan 19 '17 at 09:18

Multiple read from a txt file in bash (parallel processing )

1 Answers1

Linked

Related