0

I am doing an introduction course on UNIX - part of it is bash scripting. I seem to have understood the concepts, but in this particular problem I can't wrap my head around the issue.

I have a txt file that consists of 1 column with random usernames. That txt file is then used as a parameter for my bash script, that ideally uses the username to fetch a page and count the character count on that page. If the page gets fetched successfully, the character count is then saved along with a username in a different txt file.

Here is a code:

#!/bin/bash
filename=$1

while read username; do
    curl -fs "http://example.website.domain/$username/index.html"
    if [ $? -eq 0 ]
    then
        x=$(wc -m)
        echo "$username $x" > output.txt
    else
        echo "The page doesn't exist"
    fi
done < $filename

Now the problem I have here is that after one successful fetch, it counts the characters, outputs them to the file and just finishes the loop and exits the program. If I remove specifically "wc -m" bit, the code runs perfectly fine.

Q: Is that supposed to happen, how should I go around that to achieve my goal? Or have I made a mistake somewhere else?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578

3 Answers3

5

The code shown does not do what you think (and claim in your question).

Your curl command fetches the web and throws it to stdout: you are not keeping this information for future use. Then, your wc does not have any parameter, so it starts reading from stdin. And in stdin you have the list of usernames from $filename, so the number that gets computed are not the chars of the web, but the remaining chars of the file. Once that has been accounted, there is nothing left in stdin to be read, so the loop ends because it got to the end of the file.

You are looking for something like:

#!/bin/bash
filename="$1"

set -o pipefail
rm -f output.txt
while read username; do
    x=$(curl -fs "http://example.website.domain/$username/index.html" | wc -m)
    if [ $? -eq 0 ]
    then
        echo "$username $x" >> output.txt
    else
        echo "The page doesn't exist"
    fi
done < "$filename"

Here, the page fetched is directly fed to the wc. If curl fails you won't see that (the exit code of a series of piped commands is the exit code of the last command by default), so we use set -o pipefail to get the exit code of the rightmost exit code with a value different from zero. Now you can check if everything went OK, and in that case, you can write the result.

I also added an rm of the output file to make sure we are not growing an existing one and changed the redirection to the output file to an append to avoid re-creating the file on each iteration and ending up with the result of the last iteration (thanks to @tripleee for noting this).

Update (by popular request):

The pattern:

<cmd>
if [ $? -eq 0 ]...

is usually a bad idea. It is better to go for:

if <cmd>...

So it would be better if you switch to:

if x=$(curl -fs "http://example.website.domain/$username/index.html" | wc -m); then
    echo...
Poshi
  • 5,332
  • 3
  • 15
  • 32
  • This will cause the `if` to examine the exit code from `wc` which will always be true. Anyways, you want to avoid the antipattern of explicitly examining `$?` – tripleee Feb 28 '19 at 13:31
  • @tripleee `set -o pipefail` preserves curl's error status. – John Kugelman Feb 28 '19 at 13:32
  • Nice save, but you still have the antipattern. – tripleee Feb 28 '19 at 13:33
  • @triplee you are right, but I didn't wanted to modify more his code. Anyways, with the `pipefail` option, the exit code examined should be the one of the failing command, not the last command... or I'm missing something? – Poshi Feb 28 '19 at 13:33
  • No, that's all sorted now. But you are still overwriting the output file on each iteration. – tripleee Feb 28 '19 at 13:33
  • Hahaha! Details, details... nothing important XD – Poshi Feb 28 '19 at 13:39
  • Thank you, somehow I thought that because wc -m is nested in the if statement it would automatically take curl as input, unless something else is specified. – Arthur Edelman Feb 28 '19 at 14:16
1

The wc program (as well as many other utilities you can find on Linux) by default expects its input to be provided to it on stdin (standard input) and provides its output to stdout (standard output).

In your case, you want wc to operate on the result of your curl call. You can achieve this by storing the result of curl in a variable and passing the contents of the variable to wc

data=$(curl -fs "http://example.website.domain/$username/index.html")
...
x=$(echo "$data" | wc -m)

Or, you can put the entire command in one pipeline, which is probably better (although you might want to set -o pipefail in order to catch errors from curl):

x=$(curl -fs "http://example.website.domain/$username/index.html" | wc -m)

Otherwise, as @Dominique states, your wc will wait until it gets some input, indefinitely.

Jan Remeš
  • 19
  • 1
0

As others have already noted, just wc will "hang" because it expects you to provide input on stdin.

You seem to be looking for something like

#!/bin/bash
filename=$1
# Use read -r
while read -r username; do
    if page=$(curl -fs "http://example.website.domain/$username/index.html"); then
        # Feed the results from curl to wc
        x=$(wc -m <<<"$page")
        # Don't overwrite output file on every iteration
        echo "$username $x"
    else
        # Include parameter in error message; print to stderr
        echo "$0: The page for $username doesn't exist" >&2
    fi
# Note proper quoting
# Collect all output redirection here, too
done < "$filename" >output.txt
tripleee
  • 175,061
  • 34
  • 275
  • 318