0

There is an xml file with lot of <A_tag>-s in it.

I need to see those A tags (and their children, so the tags' whole content) that have at least one <C_tag>.

So this block should match (therefore contained in the result):

<A_tag>
    ...
    ...
    <C_tag attr1="" ... attrn="" />
    ...
</A_tag>

I tried using pcregrep, but I don't know how to tell any block ending, that is longer than 1 character (and </A_tag> is longer than that, but for instance [^>] regexp would be easy for me too).

I also tried awk, but couldn't manage the goal with it either.

If someone experienced would help me, please make your command separate the found blocks with an empty line too, with that I could learn more.

Törpetestű
  • 192
  • 10

3 Answers3

2

Following up on the xmllint comment:

xmllint --xpath '(//A_tag/C_tag/..)' x.xml

Will look for C_TAG under A_TAG, and then display the parent A_TAG.

Output:

<A_tag>
    <C_tag attr1="" attrn=""/>
</A_tag>
dash-o
  • 13,723
  • 1
  • 10
  • 37
0

Yeah, well in my case, this was the solution:

xmllint --shell x.xml <<< 'xpath //A_tag//C_tag/ancestor::A_tag'

It's because my xmllint version doesn't support --xpath option. Also, C_tag could be any descendant of A_tag, not just direct child (which I didn't clarify in question). However, the answer of dash-o seems to be correct.

My only problem is that this xml file I'm working with contains 4.5 million lines, where xmllint turned out to be slow - as it does parse the file.

If you have a more general solution that works with awk or pcregrep, please share with me. They would be good here as they just work with patterns.

Otherwise I'll accept the original answer tomorrow.

Törpetestű
  • 192
  • 10
0

If the file is pretty-printed (or follow similar rules), possible to write small awk script, and only acts on the a_tag and c_tag lines:

awk '
/<A_tag>/      { in_a=$0 ; c="" ; next }
in_a           { in_a = in_a RS $0}
/<C_tag/       { c=$0 ; next }
/<\/A_tag>/    { if ( in_a && c ) { print in_a ; in_a="" ; c=""} }
' x.xml
dash-o
  • 13,723
  • 1
  • 10
  • 37