How to manipulate text with awk?

Question

How would I be able to manipulate the output text of grep.

Right now I am using the command:

grep -i "<url>" $file  >> ./txtFiles/$file.txt

This would output something like this:

<url>http://www.simplyrecipes.com/recipes/chicken_curry_salad/</url>

and then the next text will go to the next line.

How would I be able to get rid of the <url> and </url> and stop it from going to the next line at the end.

get rid of the what? maybe you want to remove the new line character, in that case pipe it in tr "\n" " " — lc2817, Apr 25 '13 at 05:31

score 2 · Answer 1 · edited Nov 02 '21 at 08:56

2

sed '/<\/*url>/!d;s///g'

<\/*url> matches both start and end tag
Delete lines that don't have this
Then remove all cases of this pattern

With your example, it might look like this

sed '/<\/*url>/!d;s///g' $file >> ./txtFiles/$file.txt

edited Nov 02 '21 at 08:56

Nimantha

6,405
6
28
69

answered Apr 25 '13 at 05:49

Zombo

1
62
391
407

Thanks this works. One last thing though, it still goes to the next line after the url. Do you know how I would be able to get rid of this so that the next text would be able to just follow right after. – Tom D Apr 25 '13 at 23:10
or maybe it is how I am adding the next line. I am using printf, so I am assuming that that is what is causing it to go the next line for the following text. Is there something else I should be using to append the text at the end of the line, instead of at a new line? – Tom D Apr 25 '13 at 23:28

score 0 · Answer 2 · answered Apr 25 '13 at 07:30

Single commands:

sed -in '/<url>/ { s|<url>\(.*\)</url>|\1| ; p ; }' INPUT > OUTPUT

Or with awk:

awk -F "</?url>" '/<url>/ { print $2 }' INPUT > OUTPUT

Note: both might give you invalid output if more than one <url>...</url> patterns are occurring on a single line. The sed version might fail if the <url>...</url> contains any pipe (|) character.

How to manipulate text with awk?

2 Answers2