-1

I have a wget script named Chktitle.sh -- this script takes a command like below

$ Chktitle.sh "my url"

I then have a file name url.txt with over 100 lines with urls and ips to check for web-page titles. Then i have results.txt as a blank file.

Is there any way I can perform a repetitive action like below for each line in the file:

 Grab line1 from url.txt
 -----
 then execute Chktitle.sh "line1"
 -----
 Now save the result for line1 in results.txt
 -----
 Now goto Line2 ........


 etc etc etc

I need to make sure that it will only execute the next line after the previous one has finished. Can any one show me any easy way to perform this? I am happy to use Perl, sh, and consider other languages..

The contents of chktitle.sh:

#!/bin/bash
string=$1"/search/"
wget --quiet -O - $string \
| sed -n -e 's!.*<title>\(.*\)</title>.*!\1!p'
Cœur
  • 37,241
  • 25
  • 195
  • 267
  • I would say that it was much better to put the whole thing into a single Perl script. (Perl because I'm better at it than shell.) What's inside `Chktitle.sh`? Is it complex? – Borodin Dec 05 '14 at 01:33
  • no it is not complex just like this – Leo Bishop Dec 06 '14 at 18:48

4 Answers4

2

Maybe something like this could help (provided that I understood correctly) :

while read line; do
    /path/to/Chktitle.sh x"$line" >> results.txt;
done < /path/to/input.txt

For each line in /path/to/input.txt, execute your script and append the output (>>) to results.txt.

Of course you could always add additional statements in your while loop :

while read line; do
    # Initialise var to output of chktitle
    var=$(/path/to/Chktitle.sh x"$line");

    # Add conditions
    if [ "$var" = "google" ]; then
        echo "google" >> result.txt;
    else
        echo "not google" >> result.txt;
    fi
done < /path/to/input.txt
masseyb
  • 3,745
  • 1
  • 17
  • 29
  • this looks like what i need i forgot to ask also can you do and if else stament in your example if chktitle result = google save to results else do nothing and move to the next url – Leo Bishop Dec 06 '14 at 18:47
  • @LeoBishop : Edited. Hope that helps. Tbh, would also add a test to verify that the result file exists. A good starting point could be this (sorry for the lack of line breaks) : `base="/tmp"; result="result.txt"; if [ ! -d "$base" ]; then mkdir -p "$base"; touch "$base/$result"; fi` or the directory could exist but the file not, or they could both exist and you want to crush the result.txt at each run, and so forth.. – masseyb Dec 07 '14 at 09:31
  • 1
    It would be more efficient to do the redirection outside the loop. So `while do; ...; done output` – tripleee Dec 07 '14 at 11:17
0

Here is how you could do this in Perl:

use warnings;
use strict;
use LWP::Simple;

my $inputFile = 'url.txt';
open (my $fh, '<', $inputFile) or die "Could not open file '$inputFile': $!\n";
while (<$fh>) {
    my $url=chomp;
    my $str=get($url);
    if (! defined $str) {
        warn "Could not find page '$url'\n";
        next;
    }
    my ($title)=$str=~ m{<title>(.*?)</title>}s;
    if (! defined $title) {
        warn "No title in document '$url'\n";
        next;
    }
    print "$title\n";
}
close ($fh);
Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174
0
cat url.txt | xargs -I{} ./Chktitle.sh {} >> results.txt

See xargs, especially the -I switch.

This xargs call will read the input (url.txt) line by line and call ./Chktitle.sh with each such read line as a parameter.

The {} is the placeholder for the line read. You can also write

cat url.txt | xargs -Ifoo ./Chktitle.sh foo >> results.txt

(with foo as placeholder) but {} is the placeholder that is usually used for xargs.

PerlDuck
  • 5,610
  • 3
  • 20
  • 39
-2

You can create your script with 2 parameters as follows

HOW SCRIPT WORKS ON COMMAND LINE

< script >  < path to url file >    <path to excuting script>

. The codes are broken down as follows with explanation

STEP 1

#!/bin/bash
 rm -f "/root/Desktop/result.txt 2> /dev/null 

remove any file that has the name result.txt so that i can create a new blank file

STEP 2

while read -r my_url; do 
"$2" "$my_url" >> "/root/Desktop/result.txt" 
done < "$1"

Set up a while do loop to read all lines in the url file (which is known as"$1").

Each line read is saved as "my_url".

The loop take your script script (Chktitle.sh - $2) followed by the line read known as "my_url" and execute it on the command line and redirect the output to result.txt. This is done for each line.

NOW LET US SUMMARIZE ALL THE CODES INTO ONE SCRIPT AS FOLLOWS

#!/bin/bash
rm -f result.txt 2> /dev/null
while read -r my_url; do
"$2" "$my_url" >> "/root/Desktop/result.txt"
done < "$1"
repzero
  • 8,254
  • 2
  • 18
  • 40
  • 1
    Why would you write the `root` hierarchy, and why would `root` have a `Desktop` in the first place? Unnerving. – tripleee Dec 07 '14 at 11:16