-1

I am a newbie and having 1 week experience writing python scripts.

I am trying to write a generic parser (Library for all my future jobs) which parses any input XML without any prior knowledge of tags.

  • Parse input XML.
  • Get the values from the XML and Set the values basing on the tags.
  • Use these values in the rest of the job.

I am using the "xml.etree.ElementTree" library and i am able to parse the XML in the below mentioned way.

#!/usr/bin/python

import os
import xml.etree.ElementTree as etree
import logging


logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

logger.info('start reading XML property file')
filename = "mood_ib_history_parameters_DEV.xml"

logger.info('getting the current location')
__currentlocation__ = os.getcwd()
__fullpath__ = os.path.join(__currentlocation__,filename)

logger.info('start parsing the XML property file')
tree = etree.parse(__fullpath__)
root = tree.getroot()

hive_db = root.find("hive_db").text
EDGE_HIVE_CONN = root.find("EDGE_HIVE_CONN").text
target_dir = root.find("target_dir").text
to_email_alias = root.find("to_email_alias").text
to_email_cc = root.find("to_email_cc").text
from_email_alias = root.find("from_email_alias").text
dburl = root.find("dburl").text
SQOOP_EDGE_CONN = root.find("SQOOP_EDGE_CONN").text
user_name = root.find("user_name").text
password = root.find("password").text
IB_log_table = root.find("IB_log_table").text
SR_DG_master_table = root.find("SR_DG_master_table").text
SR_DG_table = root.find("SR_DG_table").text

logger.info('Hive DB %s', hive_db)
logger.info('Hive DB %s', hive_db)
logger.info('Edge Hive Connection %s', EDGE_HIVE_CONN)
logger.info('Target Directory %s', target_dir)
logger.info('To Email address %s', to_email_alias)
logger.info('CC Email address %s', to_email_cc)
logger.info('From Email address %s', from_email_alias)
logger.info('DB URL %s',dburl)
logger.info('Sqoop Edge node connection %s',SQOOP_EDGE_CONN)
logger.info('Log table name %s',IB_log_table)
logger.info('Master table name %s',SR_DG_master_table)
logger.info('Data governance table name %s',SR_DG_table)

Now the question is if i want to parse an XML without any knowledge of the tags and elements and use the values how do i do it. I have gone through multiple tutorials but all of them help me with parsing the XML by using the tags like below

SQOOP_EDGE_CONN = root.find("SQOOP_EDGE_CONN").text

Can anybody point me to a right tutorial or library or a code snippet to parse the XML dynamically.

wandermonk
  • 6,856
  • 6
  • 43
  • 93
  • do you need `parsing ` - to create etree from xml file - or `searching` - to find elements in etree ? `etree` has other function than `find`. – furas Feb 03 '16 at 07:20

2 Answers2

0

I think official documentation is pretty clear and contains some examples: https://docs.python.org/3/library/xml.etree.elementtree.html

The main part you need to implement is loop over the child nodes (potentially recursively):

for child in root:
    # child.tag contains the tag name, child.attrib contains the attributes
    print(child.tag, child.attrib)
Zbynek Vyskovsky - kvr000
  • 18,186
  • 3
  • 35
  • 43
0

Well parsing is easy as that - etree.parse(path)

Once you've got the root in hand using tree.getroot() you can just iterate over the tree using Python's "in":

for child_node in tree.getroot():
   print child_node.text

Then, to see tags these child_nodes have, you do the same trick. This lets you go over all tags in the XML without having to know the tag names at all.

Matanoga
  • 101
  • 5