1

I am a newbie with Python and I'd like to remove below element from a xml file using python .

<action name="error_mail">...........</action>

Input File :

    <action name="error_mail">
        <email xmlns="uri:oozie:email-action:0.1">
          <to>abc@xyz.com</to>
          <cc>abc@xyz.com</cc>
          <subject>Batch Failed</subject>
          <body>Batch Failed at ${node}</body>
        </email>
        <ok to="killjob"/>
        <error to="killjob"/>
      </action>
    <action name="succeed_mail">
        <email xmlns="uri:oozie:email-action:0.1">
          <to>abc@xyz.com</to>
          <cc>abc@xyz.com</cc>
          <subject>Batch Succeed</subject>
          <body>Batch completed</body>
        </email>
        <ok to="end"/>
        <error to="end"/>
      </action>
pravek
  • 21
  • 4

2 Answers2

0

Using ElementTree module

import xml.etree.ElementTree as ET

xml = '''<r> <action name="error_mail">
        <email xmlns="uri:oozie:email-action:0.1">
          <to>abc@xyz.com</to>
          <cc>abc@xyz.com</cc>
          <subject>Batch Failed</subject>
          <body>Batch Failed at ${node}</body>
        </email>
        <ok to="killjob"/>
        <error to="killjob"/>
      </action>
    <action name="succeed_mail">
        <email xmlns="uri:oozie:email-action:0.1">
          <to>abc@xyz.com</to>
          <cc>abc@xyz.com</cc>
          <subject>Batch Succeed</subject>
          <body>Batch completed</body>
        </email>
        <ok to="end"/>
        <error to="end"/>
      </action></r>'''

root = ET.fromstring(xml)
action = root.find(".//action[@name='error_mail']")
root.remove(action)
ET.dump(root)

output

<r xmlns:ns0="uri:oozie:email-action:0.1"> <action name="succeed_mail">
        <ns0:email>
          <ns0:to>abc@xyz.com</ns0:to>
          <ns0:cc>abc@xyz.com</ns0:cc>
          <ns0:subject>Batch Succeed</ns0:subject>
          <ns0:body>Batch completed</ns0:body>
        </ns0:email>
        <ok to="end" />
        <error to="end" />
      </action></r>
balderman
  • 22,927
  • 7
  • 34
  • 52
  • I am looking to remove both elements "error_mail" and "succeed_mail" from this xml file using python script .There are around 100 similar xml files stored in a directory .These xml files need to read from path = /c/xml_files/*.xml . All XML files need to get rid of below elements . Sample_input.xml -- -- -- -- Desired output from xml files :- – pravek Dec 10 '20 at 07:49
  • In the question you have asked you are talking about error email. So this what the code does. Edit the question if you have other needs or try to extend the answer to your needs. Did you try it? Did it work for you? – balderman Dec 10 '20 at 07:55
  • Yes , that's worked as magic for single file! But have an additional requirement to delete the element from multiple xml files from a specific path location. Would appreciate if you could please help me with python script to remove just the error email (i will change the script for succeed email) from multiple XML files . Thanks in advance ! – pravek Dec 10 '20 at 10:57
  • You have challenges: 1) loop over multiple xml files 2) for each of those files - remove the elements you need to remove. Start by taking the code I have submitted and see how you remove all you need to remove from a single file – balderman Dec 10 '20 at 12:08
0

Try this.

from simplified_scrapy import SimplifiedDoc, utils

files = utils.getSubFile('/c/xml_files/',end='.xml') # Get all xml file
for f in files:
    xml = utils.getFileContent(f) # Get file content
    doc = SimplifiedDoc(xml)
    error_mail = doc.select('action@name=error_mail')
    if error_mail:
        error_mail.remove() # Delete node
        utils.saveFile(f, doc.html) # Save file
yazz
  • 321
  • 1
  • 4