26

I have a log file trace.log. In it I need to grep for the content contained within the strings <tag> and </tag>. There are multiple sets of this pair of strings, and I just need to return the content between last set (in other words, from the tail of the log file).

Extra Credit: Any way I can return the content contained within the two strings only if the content contains "testString"?

Thanks for looking.

EDIT: The search parameters and are contained on different lines with about 100 lines of content separating them. The content is what I'm after...

rs79
  • 2,311
  • 2
  • 33
  • 39
  • 1
    Examples of input might help; it's not clear whether the tags are on same line or on different ones. – devnull Oct 30 '13 at 12:02
  • 1
    the tags are on different lines ..and we're looking at about 70-100 lines of content within the tags. – rs79 Oct 30 '13 at 12:12
  • 1
    Rather than putting this information in the comments, update your question. Apparently, the responses that you've received assume that the tags are on the same line. – devnull Oct 30 '13 at 12:18

5 Answers5

35

Use tac to print the file the other way round and then grep -m1 to just print one result. The look behind and look ahead checks text in between <tag> and </tag>.

tac a | grep -m1 -oP '(?<=tag>).*(?=</tag>)'

Test

Given this file

$ cat a
<tag> and </tag>
aaa <tag> and <b> other things </tag>
adsaad <tag>and  last one</tag>

$ tac a | grep -m1 -oP '(?<=tag>).*(?=</tag>)'
and  last one

Update

EDIT: The search parameters and are contained on different lines with about 100 lines of content separating them. The content is what I'm after...

Then it is a bit more tricky:

tac file | awk '/<\/tag>/ {p=1; split($0, a, "</tag>"); $0=a[1]};
                /<tag>/   {p=0; split($0, a, "<tag>");  $0=a[2]; print; exit};
                p' | tac

The idea is to reverse the file and use a flag p to check if the <tag> has appeared yet or not. It will start printing when </tag> appears and finished when <tag> comes (because we are reading the other way round).

  • split($0, a, "</tag>"); $0=a[1]; gets the data before </tag>
  • split($0, a, "<tag>" ); $0=a[2]; gets the data after <tag>

Test

Given a file a like this:

<tag> and </tag>
aaa <tag> and <b> other thing
come here
and here </tag>

some text<tag>tag is starting here
blabla
and ends here</tag>

The output will be:

$ tac a | awk '/<\/tag>/ {p=1; split($0, a, "</tag>"); $0=a[1]}; /<tag>/ {p=0; split($0, a, "<tag>"); $0=a[2]; print; exit}; p' | tac
tag is starting here
blabla
and ends here
fedorqui
  • 275,237
  • 103
  • 548
  • 598
26

If like me, you don't have access to tac because your sysadmin won't play ball you can try:

grep pattern file | tail -1
SlackGadget
  • 487
  • 6
  • 9
1

Another solution than grep would be sed:

tac file | sed -n '0,/<tag>\(.*\)<\/tag>/s//\1/p'

tac file prints the file in the reverse order (cat backwards), then sed proceeds from input line 0 to the first occurence of <tag>.*<\tag>, and substitutes <tag>.*<\tag> with only the part that was inside <tag>. The p flag prints the output, which was suppressed by -n.

Edit: This does not work if <tag> and </tag> are on different lines. We can still use sed for that:

tac file | sed -n '/<\/tag>/,$p; /<tag>/q' | sed 's/.*<tag>//; s/<\/tag>.*//' | tac

Again we use tac to read the file backwards, then the first sed command reads from the first occurrence of and quits when it finds . Only the lines in between are printed. Then we pass it to another sed process to strip the 's and finally reverse the lines again with tac.

pfnuesel
  • 14,093
  • 14
  • 58
  • 71
0
perl -e '$/=undef; $f=<>; push @a,$1 while($f=~m#<tag>(.*?)</tag>#msg); print $a[-1]' ex.txt

Extra Credit: Any way I can return the content contained within the two strings only if the content contains "testString"?

perl -e '$/=undef; $f=<>; push @a,$1 while($f=~m#<tag>(.*?)</tag>#msg); print $a[-1] if ($a[-1]~=/teststring/);' ex.txt
fedorqui
  • 275,237
  • 103
  • 548
  • 598
Vorsprung
  • 32,923
  • 5
  • 39
  • 63
0

A little untested awk that handles multiple lines:

awk '
    BEGIN    {retain="false"}
    /<\tag>/ {retain = retain + $0; keep="false"; next}
    /<tag>/  {keep = "true"; retain = $0; next}
    keep == "true" {retain = retain + $0}
    END {print retain}
' filename

We start just reading the file; when we hit the , we start keeping lines. When we hit the , we stop. If we hit another , we clear the retained string and start again. If you want all the strings, print at each

mpez0
  • 2,815
  • 17
  • 12