4

I am trying to recursively download several files using wget -m, and I intend to grep all of the downloaded files to find specific text. Currently, I can wait for wget to fully complete, and then run grep. However, the wget process is time consuming as there are many files and instead I would like to show progress by grep-ing each file as it downloads and printing to stdout, all before the next file downloads.

Example:

download file1
  grep file1 >> output.txt
download file2
  grep file2 >> output.txt
...

Thanks for any advice on how this could be achieved.

RogueBaneling
  • 4,331
  • 4
  • 22
  • 33

2 Answers2

1

As c4f4t0r pointed out

 wget -m -O - <wesbites>|grep --color 'pattern'

using grep's color function to highlight the patterns may seem helpful especially when dealing with bulky data output to terminal.

EDIT:

Below is a command line you can use. it creates a file called file and save the output messages from wget.Afterwards it tails the message file.

Using awk to find any lines with "saved" and extract filename, then use grep to pattern from filename.

 wget -m websites  &> file &  tail -f -n1 file|awk -F "\'|\`"  '/saved/{system( ("grep  --colour pattern ") $2)}'
repzero
  • 8,254
  • 2
  • 18
  • 40
  • -mO - doesn't work because wget can't find the next link to recursively download. – RogueBaneling Feb 15 '15 at 15:03
  • @RogueBaneling hmmm..interesting...answer edited showing a command line you can use – repzero Feb 15 '15 at 17:47
  • I played around with that a bit and was able to get it working with this: `wget -m -O file.txt http://google.com 2> /dev/null & sleep 1 && tail -f -n1 file.txt | grep pattern`. Originally the `tail` command was not working and I believe it is because file.txt was not created by the time `tail` was executing, so that is why I added in the `sleep`. – RogueBaneling Feb 15 '15 at 19:45
1

Based on Xorg's solution I was able to achieve my desired effect with some minor adjustments:

wget -m -O file.txt http://google.com 2> /dev/null & sleep 1 && tail -f -n1 file.txt | grep pattern

This will print out all lines that contain pattern to stdout, and wget itself will produce no output visible from the terminal. The sleep is included because otherwise file.txt would not be created by the time the tail command executed.

As a note, this command will miss any results that wget downloads within the first second.

RogueBaneling
  • 4,331
  • 4
  • 22
  • 33