2

I wanted to include an XML file in another XML file and parse it with python. I am trying to achieve it through Xinclude. There is a file1.xml which looks like

<?xml version="1.0"?>
<root>
  <document xmlns:xi="http://www.w3.org/2001/XInclude">
     <xi:include href="file2.xml" parse="xml" />
  </document>
  <test>some text</test>
</root>

and file2.xml which looks like

<para>This is a paragraph.</para>

Now in my python code i tried to access it like:

from xml.etree import ElementTree, ElementInclude

tree = ElementTree.parse("file1.xml")
root = tree.getroot()
for child in root.getchildren():
    print child.tag

It prints the tag of all child elements of root

document
test

Now when i tries to print the child objects directly like

print root.document
print root.test

It says the root doesnt have children named test or document. Then how am i suppose to access the content in file2.xml?

I know that I can access the XML elements from python with schema like:

    schema=etree.XMLSchema(objectify.fromstring(configSchema))
    xmlParser = objectify.makeparser(schema = schema)
    cfg = objectify.fromstring(xmlContents, xmlParser)
    print cfg.elemetName # access element

But since here one XML file is included in another, I am confused how to write the schema. How can i solve it?

Hari Krishnan
  • 5,992
  • 9
  • 37
  • 55
  • I wanted to parse this XML from python. Can i parse it without schema? – Hari Krishnan Oct 14 '19 at 06:36
  • You don't need a schema. You do need a tool or library that can process XInclude. lxml can do it: https://lxml.de/api.html#xinclude-and-elementinclude. – mzjn Oct 14 '19 at 06:43
  • ElementTree also has some XInclude support: https://docs.python.org/3/library/xml.etree.elementtree.html#xinclude-support – mzjn Oct 14 '19 at 07:06
  • I was going through that document. But how can i access the content inside included file? In the example given there, when i tries to access root.para, it says object has no attribute para. But para is the name of element in included xml. – Hari Krishnan Oct 14 '19 at 07:13

3 Answers3

1

Not sure why you want to use XInclude, but including an XML file in another one is a basic mechanism of SGML and XML, and can be achieved without XInclude as simple as:

<!DOCTYPE root [
  <!ENTITY externaldoc SYSTEM "file2.xml">
]>
<root>
  <document>
    &externaldoc;
  </document>
  <test>some text</test>
</root>
imhotap
  • 2,275
  • 1
  • 8
  • 16
  • 1
    The one thing to make sure of is that the included file must not have an XML or DOCTYPE Declaration on it. If you've been using one for editing the fragment, remove it before using the file in this way. Yes, this is a pain in the butt, but if you have lots of inclusions like this, write a script to strip off the declaration (and paste it back on again for editing). // So I wanted to use Xinclude – Hari Krishnan Oct 14 '19 at 11:22
  • And the problem is with how to access elements in the included file from including file's root object in python. – Hari Krishnan Oct 14 '19 at 11:24
  • @HariKrishnan wait are you saying Python exposes XML differently depending on whether it uses entities or not? That would be news to be (but then I haven't used Python for a long while) – imhotap Oct 14 '19 at 13:07
  • @iamhotap Please read the question carefully and reply for what it is asked. 1) It says the root doesnt have children named test or document. Then how am i suppose to access the content in file2.xml? 2) But since here one XML file is included in another, I am confused how to write the schema...! You can refer baldermans answer for better understanding. – Hari Krishnan Oct 15 '19 at 04:06
1

Below

import xml.etree.ElementTree as ET


xml1 = '''<?xml version="1.0"?>
<root>
  <test>some text</test>
</root>'''

xml2 = '''<para>This is a paragraph.</para>'''

root1 = ET.fromstring(xml1)
root2 = ET.fromstring(xml2)

root1.insert(0,root2)

para_value = root1.find('.//para').text
print(para_value)

output

This is a paragraph.
balderman
  • 22,927
  • 7
  • 34
  • 52
  • The problem is with how to access elements in the included file from including file's root object in python – Hari Krishnan Oct 14 '19 at 11:24
  • After you merge the 2 files you can access any element. What do you want to access? – balderman Oct 14 '19 at 11:30
  • In the given example, I want to access content in para element. – Hari Krishnan Oct 14 '19 at 11:33
  • @HariKrishnan code was modified in order to show how to access the para value. – balderman Oct 14 '19 at 11:47
  • The Question is to include one XML in another and access the included content from the outer XML. But here we are inserting one XML content in another from python itself. Anyway to access the inner element I can use root1.find('.//tagName').text. Thanks for that – Hari Krishnan Oct 14 '19 at 12:08
0

You need to make xml.etree to include the files referenced with xi:include. I have added the key line to your original example:

from xml.etree import ElementTree, ElementInclude

tree = ElementTree.parse("file1.xml")
root = tree.getroot()

#here you make the parser actually include every referenced file
ElementInclude.include(root)

#and now you are good to go
for child in root.getchildren():
    print child.tag

For a detailed reference about includes in python, see the includes section in the official Python documentation https://docs.python.org/3/library/xml.etree.elementtree.html