0

How can I browse & list XPATH of a XML Message?

****SEE EDIT portion Below:

Thanks for looking into this issue. I am not sure, whether this is the right forum to post this thread. If not, let me know the right forum to post this thread.

We have a complex XML Message (data in XML format). We are exploring a way to extract all the XPATHs of this XML message and its element/attribute level data content. We tried with XMLSPY, & xmltwig, but no luck. Xml_grep pulls data, if we give XPATH input. There is no option in xml_grep to browse all XPATHS of a XML message.

I have well-formed XML message. I want to produce a list/report as

  1. All Xpath of XML message (Browse all XPATH and list of XML message)

  2. Xpath , data content for this XPATH (Browse all XPATH, data content and list both of XML message)

Here is an example (Input XML Message)

<?xml version="1.0"?>
<PARTS>
<TITLE>Computer Parts</TITLE>
<PART>
<ITEM>Motherboard</ITEM>
<MANUFACTURER>ASUS</MANUFACTURER>
<MODEL>P3B-F</MODEL>
<COST> 123.00</COST>
</PART>
<PART>
<ITEM>Video Card</ITEM>
<MANUFACTURER>ATI</MANUFACTURER>
<MODEL>All-in-Wonder Pro</MODEL>
<COST> 160.00</COST>
</PART>
<PART>
<ITEM>Sound Card</ITEM>
<MANUFACTURER>Creative Labs</MANUFACTURER>
<MODEL>Sound Blaster Live</MODEL>
<COST> 80.00</COST>
</PART>
<PART>
<ITEM>inch Monitor</ITEM>
<MANUFACTURER>LG Electronics</MANUFACTURER>
<MODEL> 995E</MODEL>
<COST> 290.00</COST>
</PART>
</PARTS>

The desired output --> I created the following XML list manually

/PARTS/TITLE Computer       Parts
/PARTS/PART[1]/ITEM         Motherboard
/PARTS/PART[1]/MANUFACTURER ASUS
/PARTS/PART[1]/MODEL        P3B-F
/PARTS/PART[1]/COST         123.00
/PARTS/PART[2]/ITEM         Video Card
/PARTS/PART[2]/MANUFACTURER ATI
............
..............
..................
...................

Are there any open source product to produce such report for XML Message?

What are the ways to extract XPATHs/XPATH, data content?

Thanks for allowing to pick the brain of this forum.

+++++

Thanks. The above code output

Field|Value
/*|

/*/*[1]|X
/*/*[2]|000000000
/*/*[3]|000000000
/*/*[4]|&
/*/*[5]|

I am not able to get text xpath

Here is the input xml

<CorrectedW2Ind>X</CorrectedW2Ind>
<EmployeeSSN>000000000</EmployeeSSN>
<EmployerEIN>000000000</EmployerEIN>
<EmployerNameControlTxt>&amp;</EmployerNameControlTxt>
<EmployerName>
    <BusinessNameLine1Txt>#</BusinessNameLine1Txt>
    <BusinessNameLine2Txt>#</BusinessNameLine2Txt>
</EmployerName>
<EmployerUSAddress>
    <AddressLine1Txt>0</AddressLine1Txt>
    <AddressLine2Txt>0</AddressLine2Txt>
    <CityNm>A</CityNm>
    <StateAbbreviationCd>PW</StateAbbreviationCd>
    <ZIPCd>00000</ZIPCd>
</EmployerUSAddress>

    <EmployersUseGrp>
    <EmployersUseCd>A</EmployersUseCd>
    <PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
    <EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
<EmployersUseGrp>
    <EmployersUseCd>A</EmployersUseCd>
    <PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
    <EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
<EmployersUseGrp>
    <EmployersUseCd>A</EmployersUseCd>
    <PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
    <EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
<EmployersUseGrp>
    <EmployersUseCd>A</EmployersUseCd>
    <PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
    <EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
<EmployersUseGrp>
    <EmployersUseCd>A</EmployersUseCd>
    <PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
    <EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>

a) What is the lxml method to use , to get value, Xpath (text) using above code?

b) What is the lxml method to use, to get repeating group node aggration?

like Xpath of EmployersUseGrp ====> 5

EDIT ===== 6/26/2019 ========================

I am not able to open new questions. I am getting question limit exceeded message. I am posting the follow up to this code here.

I am trying to use the posted pyhton code answer. I am getting weird output.

I have a large XML file like (inputf.xml). I used this file as input = inputf.xml in posted code




    <?xml version="1.0" encoding="UTF-8"?>
      <DataFileFor>
        <DataR>
           <Id>5070022019330a0050hq</Id>
             <NUM>30221730001019</NUM>
             <Postmark>2020-01-03T09:25:57.000-05:00</Postmark>
             <TNO>47647</TNO>
.
.
.
.
.
</DataFileFor>

++++

When grab the XPATH of Node using xml_grep, I am getting.

xml_grep DataFileFor/DataR/Ret/W2 inputf.xml ===> output


<?xml version="1.0" ?>

<xml_grep version="0.7" date="Fri Jun 26 13:07:11 2020">

<file filename="inputf.xml">

  <W2 Id="W2" dName="W2" sId="00000000" sVersionNum="String">

    <CorrectedW2Ind>X</CorrectedW2Ind>

    <EmployeeSSN>000000000</EmployeeSSN>

    <EmployerEIN>000000000</EmployerEIN>

    <EmployerNameControlTxt>S</EmployerNameControlTxt>

    <EmployerName>

      <BusinessNameLine1Txt>String</BusinessNameLine1Txt>

      <BusinessNameLine2Txt>String</BusinessNameLine2Txt>

    </EmployerName>

    <EmployerUSAddress>

      <AddressLine1Txt>String</AddressLine1Txt>

      <AddressLine2Txt>String</AddressLine2Txt>

      <CityNm>String</CityNm>

      <StateAbbreviationCd>AL</StateAbbreviationCd>

      <ZIPCd>000000000</ZIPCd>
.
.
.
.
.
</W2>

When I use this code, it is not producing readable Xpaths. The output XPATHS are like


/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[10]|X
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[11]|00000000
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[12]|00000000
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[13]|S
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[14]|String

The attributes

Id="W2" dName="W2" sId="00000000" sVersionNum="String"> are not showing up in the output

What are the changes required to the code, to fix this?

Thanks for your guidance.

  • I am not clear with your question. Do you want to see all possible XPath that can be applied to your XML? or based on an output XML you wish to see what XPath was applied to your input XML? – Megha Oct 23 '15 at 03:12
  • "Do you want to see all possible XPath that can be applied to your XML? " YES, I am looking a tool/program that take XML Message as input (file) and output report/list ( ALL XPATH, data content of this XPATH in input XML Message). – user2647763 - RIMD Oct 23 '15 at 03:26
  • I dont think there is any such tool. The number of Xpath that you can have even for a simple XML file is quite large – Megha Oct 23 '15 at 05:34
  • Added more info to my question after tryring with posted code – user2647763 - RIMD Jun 30 '20 at 01:47

1 Answers1

1

Just seen this, i wrote something that did this in python - outputs to csv, pipe delimited. Feel free to use it. Happy to answer any questions but don't expect immediate response.

from lxml import etree, objectify

def parseXML(xmlFile, outputFile):
    """
    Parse the XML function
    """
    with open(xmlFile) as fobj:
        xml = fobj.read()

    f = open(outputFile,'w') #open write to file
    root = etree.fromstring(xml)

    f.write("%s|%s\n" %("Field", "Value"))
    tree = etree.ElementTree(root)
    for e in root.iter():
        f.write("%s|%s\n" %(tree.getpath(e), e.text))

    f.close()

if __name__ == "__main__":
    print ('Loading variables...')
    input = '16a.xml'
    output = input + '.csv'

    parseXML(input,output)
IbnStack
  • 61
  • 5