I am a newbie and having 1 week experience writing python scripts.
I am trying to write a generic parser (Library for all my future jobs) which parses any input XML without any prior knowledge of tags.
- Parse input XML.
- Get the values from the XML and Set the values basing on the tags.
- Use these values in the rest of the job.
I am using the "xml.etree.ElementTree" library and i am able to parse the XML in the below mentioned way.
#!/usr/bin/python
import os
import xml.etree.ElementTree as etree
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
logger.info('start reading XML property file')
filename = "mood_ib_history_parameters_DEV.xml"
logger.info('getting the current location')
__currentlocation__ = os.getcwd()
__fullpath__ = os.path.join(__currentlocation__,filename)
logger.info('start parsing the XML property file')
tree = etree.parse(__fullpath__)
root = tree.getroot()
hive_db = root.find("hive_db").text
EDGE_HIVE_CONN = root.find("EDGE_HIVE_CONN").text
target_dir = root.find("target_dir").text
to_email_alias = root.find("to_email_alias").text
to_email_cc = root.find("to_email_cc").text
from_email_alias = root.find("from_email_alias").text
dburl = root.find("dburl").text
SQOOP_EDGE_CONN = root.find("SQOOP_EDGE_CONN").text
user_name = root.find("user_name").text
password = root.find("password").text
IB_log_table = root.find("IB_log_table").text
SR_DG_master_table = root.find("SR_DG_master_table").text
SR_DG_table = root.find("SR_DG_table").text
logger.info('Hive DB %s', hive_db)
logger.info('Hive DB %s', hive_db)
logger.info('Edge Hive Connection %s', EDGE_HIVE_CONN)
logger.info('Target Directory %s', target_dir)
logger.info('To Email address %s', to_email_alias)
logger.info('CC Email address %s', to_email_cc)
logger.info('From Email address %s', from_email_alias)
logger.info('DB URL %s',dburl)
logger.info('Sqoop Edge node connection %s',SQOOP_EDGE_CONN)
logger.info('Log table name %s',IB_log_table)
logger.info('Master table name %s',SR_DG_master_table)
logger.info('Data governance table name %s',SR_DG_table)
Now the question is if i want to parse an XML without any knowledge of the tags and elements and use the values how do i do it. I have gone through multiple tutorials but all of them help me with parsing the XML by using the tags like below
SQOOP_EDGE_CONN = root.find("SQOOP_EDGE_CONN").text
Can anybody point me to a right tutorial or library or a code snippet to parse the XML dynamically.