0

I'm trying to use Element tree to locate an element of interest from an XML and remove the entire group (i.e. the parent) from the XML.

import xml.etree.ElementTree as ET
from lxml import etree 

copasiML_str= IA.read_copasiML_as_string(model_file) # Reads XML as string
copasiML=ET.fromstring(copasiML_str) # parse XML to etree

for i in copasiML.findall(".//*[@name='ObjectCN']"): # locate element 
    if '[v18]' in  i.attrib['value']:           #search for 'v18' 
        if 'Parameter=V' in i.attrib['value']:   #search for 'Parameter=V'
            print i.attrib['value']             #Element identified
            parent = i.getparent()   #gets the parent of identified
            copasiML.remove(parent) # This does not work

This code identifies the element and gets the parent of the element I want to remove. Then it gives me an error when I try to remove the element:

ValueError: Element is not a child of this node.

The XML in question is fairly complicated. Here is a snippet:

<ParameterGroup name="FitItem">
            <ParameterGroup name="Affected Cross Validation Experiments">
            </ParameterGroup>
            <ParameterGroup name="Affected Experiments">
              <Parameter name="Experiment Key" type="key" value="Experiment_1"/>
              <Parameter name="Experiment Key" type="key" value="Experiment_2"/>
              <Parameter name="Experiment Key" type="key" value="Experiment_4"/>
            </ParameterGroup>
            <Parameter name="LowerBound" type="cn" value="1e-06"/>
            <Parameter name="ObjectCN" type="cn" value="CN=Root,Model=NoName,Vector=Reactions[V18],ParameterGroup=Parameters,Parameter=V,Reference=Value"/>
            <Parameter name="StartValue" type="float" value="0.1852208634119804"/>
            <Parameter name="UpperBound" type="cn" value="100"/>
          </ParameterGroup>

There are many 'FitItem' parameter groups. I'm trying to locate the one with '[V18]' and 'Parameter=V' and delete the entire FitItem. Would anybody know how to do this?

Thanks

CiaranWelsh
  • 7,014
  • 10
  • 53
  • 106
  • Sorry, it was a pasting error. The root is called copasiML. – CiaranWelsh Jul 11 '15 at 15:02
  • Given that sample XML, parent of `*[@name='ObjectCN']` is the root element, that means you're trying to delete the entire XML which will make it not well-formed XML – har07 Jul 11 '15 at 15:09
  • Please recheck, I suspect in the actual code you use `lxml` (`etree.fromstring()` or similar) instead to populate `copasiML` variable, because built-in `ElementTree` doesn't have method `getparent()` – har07 Jul 11 '15 at 15:19

2 Answers2

1

If the XML posted is only part of a larger XML and <ParameterGroup name="FitItem"> isn't actually the root element, you should be able to remove element referenced by parent variable from it's parent (don't be confused) like so :

......
parent = i.getparent()
parent.getparent().remove(parent)

otherwise, you can't remove parent because it references the root element, and XML document requires exactly one root element to stay qualified as XML.

This is a working example for demo :

from lxml import etree

xml = '''<root>
    <ParameterGroup name="FitItem">
            <ParameterGroup name="Affected Cross Validation Experiments">
            </ParameterGroup>
            <ParameterGroup name="Affected Experiments">
              <Parameter name="Experiment Key" type="key" value="Experiment_1"/>
              <Parameter name="Experiment Key" type="key" value="Experiment_2"/>
              <Parameter name="Experiment Key" type="key" value="Experiment_4"/>
            </ParameterGroup>
            <Parameter name="LowerBound" type="cn" value="1e-06"/>
            <Parameter name="ObjectCN" type="cn" value="CN=Root,Model=NoName,Vector=Reactions[V18],ParameterGroup=Parameters,Parameter=V,Reference=Value"/>
            <Parameter name="StartValue" type="float" value="0.1852208634119804"/>
            <Parameter name="UpperBound" type="cn" value="100"/>
          </ParameterGroup>
</root>'''
copasiML=etree.fromstring(xml)
query = "//*[@name='ObjectCN'][contains(@value,'[V18]')][contains(@value,'Parameter=V')]"
for i in copasiML.xpath(query):
    parent = i.getparent()
    parent.getparent().remove(parent)

print etree.tostring(copasiML)

output :

<root>
    </root>
har07
  • 88,338
  • 12
  • 84
  • 137
  • Hi har07. I though something like this should work however I a getting an AttributeError: 'builtin_function_or_method' object has no attribute 'remove' – CiaranWelsh Jul 11 '15 at 15:43
  • @user3059024 worked for me, try to run the demo code – har07 Jul 11 '15 at 15:57
  • Side note : BS is a possible alternative, but not really better than lxml in general. http://stackoverflow.com/questions/31351856/are-there-any-benefits-of-using-beautiful-soup-to-parse-xml-over-using-lxml-alon – har07 Jul 11 '15 at 16:01
  • Your strategy worked. The only change I made was to match the entire value of 'value' (i.e.CN=Root,Model=NoName,Vector=Reactions[V18],ParameterGroup=Parameters,Parameter=V,Reference=Value"/>) and it worked. Thanks for the help. Much appreciated. – CiaranWelsh Jul 11 '15 at 16:25
1

Once I learned BeautifulSoup, I never go back use use etree.

Note:

  1. I added the root copasiML into your XML based on the comment
  2. I added another FitItem with datafireball as the text just to show we locate the right element in the end.
  3. In BeautifulSoup, I used two approaches to locate element find(lamda), find(args..), since you have quite a few rules locating FitItem while your find parent logic is fairly simple.

Here is the code:

from bs4 import BeautifulSoup
myString = """
<ParameterGroup name="copasiML">
<ParameterGroup name="FitItem">
    <ParameterGroup name="Affected Cross Validation Experiments"></ParameterGroup>
    <ParameterGroup name="Affected Experiments">
      <Parameter name="Experiment Key" type="key" value="Experiment_1"/>
      <Parameter name="Experiment Key" type="key" value="Experiment_2"/>
      <Parameter name="Experiment Key" type="key" value="Experiment_4"/>
    </ParameterGroup>
    <Parameter name="LowerBound" type="cn" value="1e-06"/>
    <Parameter name="ObjectCN" type="cn" value="CN=Root,Model=NoName,Vector=Reactions[V18],ParameterGroup=Parameters,Parameter=V,Reference=Value"/>
    <Parameter name="StartValue" type="float" value="0.1852208634119804"/>
    <Parameter name="UpperBound" type="cn" value="100"/>
</ParameterGroup>
<ParameterGroup name="FitItem">Datafireball</ParameterGroup>
</ParameterGroup>
"""
soup = BeautifulSoup(myString, "xml")

def myfunc(e):
    try:
        if (e['name'] == 'ObjectCN') and (e.name == 'Parameter') and ('V18' in e['value']):
            return True
        else: 
            return False
    except:
        return False

target = soup.find(lambda x: myfunc(x))
parent = target.find_parent('ParameterGroup', {'name':'FitItem'})
parent.decompose()

print soup.prettify()

This is the output:

<?xml version="1.0" encoding="utf-8"?>
<ParameterGroup name="copasiML">
 <ParameterGroup name="FitItem">
  Datafireball
 </ParameterGroup>
</ParameterGroup>
B.Mr.W.
  • 18,910
  • 35
  • 114
  • 178