3

I need to find attribute values based on other values pulled from parent's/grand-parent's sibling's children. I think it's going to take 2 different expressions.

So given the following XML (which is derived from a log file that can be thousands of lines long):

<p:log xmlns:p="urn:NamespaceInfo">
 <p:entries>
   <p:entry timestamp="2012-12-31T09:39:25">
     <p:attributes>
       <p:attrib name="Position" value="1B2" />
       <p:attrib name="Something" value="Something_else" />
     </p:attributes>
     <p:msg>
     </p:msg>
   </p:entry>
   <p:entry timestamp="2012-12-31T09:39:25">
     <p:attributes>
       <p:attrib name="Form" value="FormA" />
     </p:attributes>
     <p:msg>
     </p:msg>
   </p:entry>
   <p:entry timestamp="2012-12-31T09:39:25">
     <p:msg>Successful....</p:msg>
   </p:entry>
   <p:entry timestamp="2012-12-31T12:12:12">
     <p:attributes>
       <p:attrib name="Position" value="1B3" />
       <p:attrib name="Something" value="Something_else" />
     </p:attributes>
     <p:msg>
     </p:msg>
   </p:entry>
   <p:entry timestamp="2012-12-31T09:39:25">
     <p:attributes>
       <p:attrib name="Form" value="FormB" />
     </p:attributes>
     <p:msg>
     </p:msg>
   </p:entry>
   <p:entry timestamp="2012-12-31T09:39:25">
     <p:msg>Processing....</p:msg>
   </p:entry>
   <p:entry timestamp="2012-12-31T09:39:25">
     <p:msg>Error1</p:msg>
   </p:entry>
   <p:entry timestamp="2012-12-31T09:39:25">
    <p:msg>Error1</p:msg>
   </p:entry>
 </p:entries>
     ...
</p:log>
  • (<p:attributes> parent tags can have multiple <p:attrib> child tags)
  • (<p:event> tags can only have one <p:msg> tag)

First, I need to grab the value of the value attribute that has a corresponding name attribute of Position, but only if the grand-parent's sibling p:entry has a child p:msg with the text of Error1. Also, it needs to stay within that section. For instance, I don't want the first occurrence of the Position'/'Value pair because a new Position/Value pair appears before the Error1, even though technically the p:msg with the Error1 is a sibling of both grand-parents.

Next, I need the timestamp attributes' value of the parent of the child whose Position/Value I just grabbed. So, find the position, then find the timestamp attribute value of the grand-parent p:entry tag.

So for this example, I should be able to retrieve the following values only:

1B3

2012-12-31T12:12:12 (the date/time stamps given are arbitrary values. This one is different so you know which one I was referencing).

Kind of confusing I know. I will also need to make sure I grab just one instance because I am using XQuery to get the data out of a database and each expression has to result to a singular value.

I can get to the first timestamp associated with the p:msg with Error1 with the following: //p:entry[descendant::p:msg='Error1.'][1]/@timestamp

but can't seem to get back up the tree to get the other values.

I can get the all of timestamps of the p:events that have p:attrib grand-children with: //p:entry[descendant::p:attrib[@name=''Position'']]/@timestamp)[1]

but I can't seem to limit it to just the one that has the 'Error1' following it. I can't base my selection on position. I have to base it first on content.

BONUS QUESTION

How could I do this again on the next instance down the log file? (not just the second Error1 message, the next time down the log file where the Error1 msg shows up for the next 'parent/sibling' match). This may be obvious once I get the answer to the questions above.

N1tr0
  • 485
  • 2
  • 6
  • 24

1 Answers1

1

UPDATED:

OK I think I got this. Here's the answer to the first one:

//p:msg[text()="Error1"]/../preceding-sibling::p:entry[./*/p:attrib[@name="Position"]][1]/*/p:attrib[@name="Position"]/@value

This is working back from the p:msg tag, which makes it easier to select the first (that's the [1] in there) of the preceding parent p:entry tags which satisfy the condition that they have a grandchild p:attrib with a name Position.

Getting the timestamp is just a tad simpler:

//p:msg[text()="Error1"]/../preceding-sibling::p:entry[./*/p:attrib[@name="Position"]][1]/@timestamp

Try that out and see what you think.

ORIGINAL ANSWER:

Normally I don't post half-finished answers, but my guess is that you won't get anything else since this question is so complicated, so here's the xpath for what you describe in the first paragraph:

//p:entry[following-sibling::p:entry/p:msg/text()="Error1"]/*/p:attrib[@name="Position"]/@value

This will get

the value of the value attribute that has a corresponding name attribute of Position, but only if the grand-parent's sibling p:entry has a child p:msg with the text of Error1.

However I don't know what you mean when you say "it needs to stay within that section". Can you clarify? This will return both 1B2 and 1B3.

For the second part of your question, you can get the timestamp for the entries above with this:

//p:entry[following-sibling::p:entry/p:msg/text()="Error1" and ./*/p:attrib[@name="Position"]]/@timestamp

Again though, this won't do the "section" thing you mentioned. That's a bit more tricky, beyond my (current) knowledge of xpath unfortunately.

Chris Salzberg
  • 27,099
  • 4
  • 75
  • 82
  • Update: I re-read your question and I think I now understand what you want. – Chris Salzberg Sep 25 '12 at 22:25
  • Thanks shioyama. This is close to what I need for the one part. Now I just need to be able to limit it to return one result. I think I'll create a new xml sample that is more generic but much more clear so that the readers will understand what 'sections' I am referring too. – N1tr0 Sep 26 '12 at 11:55
  • So I've been looking at your example and I think I have the general format for grabbing the timestamp but for some reason it's not pulling back any values. Here's what I have so far: `//p:entry[following-sibling::p:entry/p:msg/text()="Error1."]/*/p:entry[decendant::p:attrib[@name="Position"]]/@timestamp` – N1tr0 Sep 26 '12 at 12:11
  • I think I need to post this to the XQuery crowd. I just tried to run xpath above in an XQuery script and got an error back saying 'following-sibling' is not supported in SQL Server 2008. – N1tr0 Sep 26 '12 at 12:42
  • eh, if following-sibling is not supported then you're not going to get very far! – Chris Salzberg Sep 26 '12 at 12:46
  • Although, I'm not quite where I need to be, I'm really close. Thanks shioyama for taking the time to look into this. Once I get the XQuery working (hopefully), maybe I can just get the results I need by creating a temp table in sql and working on the data pulled from the xml. Still not sure how to limit to the first occurrence though. Using [1] doesn't work because that is based on the position of nodes in the document, not when the xpath first meets my qualifications. Oh much fun. – N1tr0 Sep 26 '12 at 13:02
  • Hi shioyama, I just saw your comment about following-sibling and I was afraid of that. I was hoping there might be a work around. – N1tr0 Sep 26 '12 at 13:04
  • Thanks for the updates to your answers! That definitely returns better results. I do get multiple returns still but I don't think I can get around that with the XML I have to deal with. – N1tr0 Sep 26 '12 at 14:05