1

Suppose I have a XML file like this (bookstore.xml )

<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="web" cover="paperback">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>

And I want to delete the book element whose author = J K Rowling.
I know I can get all the elements matching author like this (Jython)

docFactory = DocumentBuilderFactory.newInstance()
docBuilder = docFactory.newDocumentBuilder()
doc = docBuilder.parse(bookstore.xml)
list = doc.getElementsByTagName("author")

I want to write the modified XML tree to bookstore.xml.

Thanks !

mzjn
  • 48,958
  • 13
  • 128
  • 248
vasu1486
  • 127
  • 2
  • 7

3 Answers3

1

Instead of working with the org.w3c.dom.* and javax.xml.* Java APIs, I would suggest using ElementTree. This library is supported in Jython and simplifies things greatly.

from xml.etree import ElementTree as ET

root = ET.parse("bookstore.xml").getroot()
books = root.findall("book")

for book in books:
    if book.findtext("author") == "J K. Rowling":
        print "Found!"
        root.remove(book)

ET.ElementTree(root).write("output.xml")

Tested with Jython 2.5.2 (and CPython 2.7.2).

mzjn
  • 48,958
  • 13
  • 128
  • 248
0

Here are the operation steps in python2.7. But I didn't write to a script, because it overly depends your xml structure.

>>> from xml.dom import minidom
>>> xmldoc = minidom.parse('a.xml')
>>> root = xmldoc.documentElement
>>> nodeList = xmldoc.childNodes
>>> bookstore = nodeList[0].childNodes
>>> bookstore
[<DOM Text node "u'\n'">, <DOM Element: book at 0x2544580>, <DOM Text node "u'\n'">, <DOM Element: book at 0x2544a30>, <DOM Text node "u'\n'">, <DOM Element: book at x2544e90>, <DOM Text node "u'\n'" >, <DOM Element: book at 0x25475d0>, <DOM Text node "u'\n'">]
>>> bookstore[3].getElementsByTagName("author")[0].childNodes[0].data
u'J K. Rowling'
>>> nodeList[0].removeChild(bookstore[3])
>>> with open('output.xml', 'w') as f:
...     f.write(xmldoc.saveXML(nodeList[0]))
...
>>> 

Results:

<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>

<book category="web">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="web" cover="paperback">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>

I think this dom moudle is very complex to use. It is better to try with others, like xml.etree.ElementTree in Python.

jinghli
  • 617
  • 4
  • 11
0

The Following Worked

for i in range(list.getLength()):
    node = list.item(i)
    if node != None and node.getNodeName() == "book":
        children = node.getChildNodes()
        for j in range(children.getLength()):
            print "Looking for J K. Rowling in book"
            child = children.item(j)
            if  child.getNodeName() == "author" and child.getTextContent() == "J K. Rowling":
                print "************"
                print "Found!!!!!"
                print child.getNodeName()
                print node.getTextContent()
                node1= node.getParentNode().removeChild(child.getParentNode())
vasu1486
  • 127
  • 2
  • 7