0

I want a text browser like lynx,w3m or links to perform a bulk query from a list of available links. The results will be filtered for a key word and should be added to the original list. An example, let the list be in list.txt:

"http://dict.cc//?s=Chemical"
"http://dict.cc//?s=Fenster"

I can extract the result if I only submit one link a time, e.g.

head -n 1 list.txt | xargs links -dump | sed -n '/NOUN/p'
tail -n 1 list.txt | xargs links -dump | sed -n '/NOUN/p'

works as expected, but not:

cat list.txt | xargs links -dump | sed -n '/NOUN/p'

or

for line in `cat list.txt`; do links -dump $line ; done

What am I doing wrong? Next step, the output should be appended to list in the correct line, so that list.txt will look like this after the operation:

"http://dict.cc//?s=Chemical" edit  NOUN   a chemical | chemicals       -
"http://dict.cc//?s=Fenster" NOUN   das Fenster | die Fenster    edit

Should be possible by combination or usage with other tools like paste etc. This does not work like above, what would be a better solution?:

for line in `cat list.txt`; do echo -n $line && links -dump $line; done

The example is just for demonstration, I will use other sites than dict.cc. Unfortunately no API/REST available.

smartmic
  • 661
  • 5
  • 15

2 Answers2

1

xargs will pass more than one argument at once to the program, unless you limit the number of arguments: xargs -n1 links -dump. Maybe links only accepts one argument. If you only need the exact file in the URL and not any other linked documents, you may also use curl.

for line in cat list.txt splits at any whitespace. So it will not work if any line in list.txt contains spaces.

Try this to iterate over the list:

cat 'list.txt' | while IFS= read -r line; do
  echo -n $line && links -dump $line
done
sapanoia
  • 789
  • 4
  • 14
0

I have twiddled with the commands until I found the bug. The problem lies in the double quotes of URLs given in list.txt. After removing, this works fine:

for line in `cat engl.txt`; do 
  echo -n $line && links -dump $line| sed -n '/NOUN/p' 
done

If one (has to) keep double quotes, using the entries in the file above as an command to links passed by xargs works (but not the command just above):

for line in `cat list.txt`; do 
  echo -n $line && echo $line | xargs links -dump | sed -n '/NOUN/p'
done
smartmic
  • 661
  • 5
  • 15