XPATH: Parse XML need sibling's children and more

Question

so I have this

<?xml version="1.0" encoding="UTF-8"?>
<ClinicalDocument xmlns="urn:hl7-org:v3">
  <realmCode code="US" />
  <typeId extension="POCD_HD000040" root="2.16.840.1.113883.1.3" />
  <templateId root="1.2.840.114350.1.72.1.51693" />
  <templateId root="2.16.840.1.113883.10.20.22.1.1" />
  <templateId root="2.16.840.1.113883.10.20.22.1.1" extension="2015-08-01" />
  <templateId root="2.16.840.1.113883.10.20.22.1.2" />
  <templateId root="2.16.840.1.113883.10.20.22.1.2" extension="2015-08-01" />
  <id assigningAuthorityName="EPC" root="1.2.840.114350.1.13.535.2.7.8.688883.17473398" />
  <code code="34133-9" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC" displayName="Summarization of Episode Note" />
  <title>Clinical Summary</title>
  <effectiveTime value="20181016153816-0400" />
  <confidentialityCode code="N" codeSystem="2.16.840.1.113883.5.25" displayName="Normal" />
  <languageCode code="en-US" />
  <setId assigningAuthorityName="EPC" extension="d5ccd6e6-4b6b-11e7-90e8-f508dff85edf" root="1.2.840.114350.1.13.535.2.7.1.1" />
  <versionNumber value="31" />
  <recordTarget>

This part is down lower, where I need to extract the data I need

          <code code="10160-0" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC" displayName="History of Medication Usage" />
          <title>Current Medications</title>
          <text>
             <table>
                <colgroup>
                   <col width="25%" />
                   <col width="25%" />
                   <col width="13%" />
                   <col width="12%" />
                   <col width="8%" />
                   <col width="8%" />
                   <col width="9%" />
                </colgroup>
                <thead>
                   <tr>
                      <th>Prescription</th>
                      <th>Sig.</th>
                      <th>Disp.</th>
                      <th>Refills</th>
                      <th>Start Date</th>
                      <th>End Date</th>
                      <th>Status</th>
                   </tr>
                </thead>
                <tbody>
                   <tr ID="currx6">
                      <td>
                         <paragraph ID="med6">Misc. Devices (BATH/SHOWER SEAT) Misc</paragraph>
                         <content styleCode="allIndent">
                            Indications:
                            <content ID="indication7">Mild cognitive impairment</content>
                            ,
                            <content ID="indication8">MGD (meibomian gland disease)</content>
                            ,
                            <content ID="indication9">Glaucoma suspect</content>
                            ,
                            <content ID="indication10">Nuclear sclerosis</content>
                         </content>
                      </td>
                      <td ID="sig6">Pt needs shower/bath bar to assist with getting in and out of bath tub/shower.</td>
                      <td>
                         <paragraph>1 Units</paragraph>
                      </td>
                      <td>0</td>
                      <td>06/21/2013</td>
                      <td />
                      <td>Active</td>
                   </tr>
                   <tr ID="currx11">
                      <td>
                         <paragraph ID="med11">Misc. Devices (HUGO ROLLING WALKER) Misc</paragraph>

I'm pretty much trying to get the paragraph ones with the ID only. I was using this

NodeList nodeList = (NodeList) xpath.evaluate(  "//*[local-name()='code'][@code='10160-0']/following-sibling::*[local-name()='text']/table/tbody/tr/td/paragraph", new InputSource(new StringReader(docString)), XPathConstants.NODESET);

but it keeps telling me I have 0 nodes... and if I make it just try to get the table it tells me I have 1 node.. but that its null.. what exactly am I doing wrong ??

SOLUTION : to get the paragraphs

//*[local-name()='code'][@code='10160-0']/following-sibling::*[local-name()='text']//*[local-name()='paragraph']

to get the ID= only ones

//*[local-name()='code'][@code='10160-0']/following-sibling::*[local-name()='text']//*[local-name()='paragraph'[@ID]]

kjhughes · Answer 1 · 2018-10-16T19:02:45.733

1

I'm pretty much trying to get the paragraph ones with the ID only.

This XPath,

//*[@ID]

will select all elements that have an ID attribute, and this XPath,

//paragraph[@ID]

will select all paragraph elements that have an ID attribute.

Other notes:

Don't use constructs such as //*[local-name()='code'] when no namespaces are in play; just use //code. (And if namespaces are in play, define a namespace prefix and reference them properly rather than defeating them. See How does XPath deal with XML namespaces?)
//*[local-name()='code'][@code='10160-0']/following-sibling::*[local-name()='text'] is failing because text isn't a sibling of node. Perhaps you meant to use following:: instead.

edited Oct 16 '18 at 19:02

answered Oct 16 '18 at 18:25

kjhughes

106,133
27
181
240

maybe I need to explain better.. I need to open xml file, find all with ones and weed out the ones I dont if needed. I'll give what you said a try now – kevin_kss Oct 16 '18 at 19:30
In your posted XML, `code` has no descendant elements at all; it's empty. – kjhughes Oct 16 '18 at 19:38
my indenting was hand done.. so code,title and text are all siblings. So I assume I grab the code with the code # I want.. and then the following-sibling gets the text node. Then what I need to do is get the paragraph ones way under that.. make sense ? – kevin_kss Oct 16 '18 at 19:42
Please fix your XML to be well-formed, without any `+` and `-` outline symbols. Well-formed XML has a single root element. Then you can use basic tools to handle the indentation for you. Otherwise, you're wasting everyone's time on unnecessary distractions. After you've done that, we'll be able to show you how to write an XPath to any node you need. – kjhughes Oct 16 '18 at 19:44
sorry about that.. whats the best way to get it indented automatically ? I just copied from IE window..but when you paste it.. it messes it all up – kevin_kss Oct 16 '18 at 19:47
View the source. Copy from there, not from the outline presentation. – kjhughes Oct 16 '18 at 19:48
You still haven't posted well-formed XML. Without the root element of the XML, we don't know if there are namespaces in play. – kjhughes Oct 16 '18 at 20:18
here is the top of the XML – kevin_kss Oct 17 '18 at 12:40
I think we got it.. //*[local-name()='code'][@code='10160-0']/following-sibling::*[local-name()='text']//*[local-name()='paragraph'] – kevin_kss Oct 17 '18 at 12:47

XPATH: Parse XML need sibling's children and more

1 Answers1