0

How would I be able to manipulate the output text of grep.

Right now I am using the command:

grep -i "<url>" $file  >> ./txtFiles/$file.txt

This would output something like this:

<url>http://www.simplyrecipes.com/recipes/chicken_curry_salad/</url>

and then the next text will go to the next line.

How would I be able to get rid of the <url> and </url> and stop it from going to the next line at the end.

Zombo
  • 1
  • 62
  • 391
  • 407
Tom D
  • 41
  • 2
  • 6
  • get rid of the what? maybe you want to remove the new line character, in that case pipe it in tr "\n" " " – lc2817 Apr 25 '13 at 05:31

2 Answers2

2
sed '/<\/*url>/!d;s///g'
  • <\/*url> matches both start and end tag
  • Delete lines that don't have this
  • Then remove all cases of this pattern

With your example, it might look like this

sed '/<\/*url>/!d;s///g' $file >> ./txtFiles/$file.txt
Nimantha
  • 6,405
  • 6
  • 28
  • 69
Zombo
  • 1
  • 62
  • 391
  • 407
  • Thanks this works. One last thing though, it still goes to the next line after the url. Do you know how I would be able to get rid of this so that the next text would be able to just follow right after. – Tom D Apr 25 '13 at 23:10
  • or maybe it is how I am adding the next line. I am using printf, so I am assuming that that is what is causing it to go the next line for the following text. Is there something else I should be using to append the text at the end of the line, instead of at a new line? – Tom D Apr 25 '13 at 23:28
0

Single commands:

sed -in '/<url>/ { s|<url>\(.*\)</url>|\1| ; p ; }' INPUT > OUTPUT

Or with awk:

awk -F "</?url>" '/<url>/ { print $2 }' INPUT > OUTPUT

Note: both might give you invalid output if more than one <url>...</url> patterns are occurring on a single line. The sed version might fail if the <url>...</url> contains any pipe (|) character.

Zsolt Botykai
  • 50,406
  • 14
  • 85
  • 110