0

I have this xml format.....

<event timestamp="0.447463" bustype="LIN" channel="LIN 1">  
 <col name="Time"/>  
 <col name="Start of Frame">0.440708</col>  
 <col name="Channel">LIN 1</col>  
 <col name="Dir">Tx</col>  
 <col name="Event Type">LIN Frame (Diagnostic Request)</col>  
 <col name="Frame Name">MasterReq_DB</col>  
 <col name="Id">3C</col>  
 <col name="Data">81 06 04 04 FF FF 50 4C</col>  
 <col name="Publisher">TestMaster (simulated)</col>  
 <col name="Checksum">D3 &quot;Classic&quot;</col>  
 <col name="Header Duration">2.090 ms (40.1 bits)</col>  
 <col name="Resp. Duration">4.688 ms (90.0 bits)</col>  
 <col name="Time difference">0.049987</col>  
 <empty/>  
</event>  

In above xml, i need to extract data associated with attribute 'name'
Am able to get all names but am unable to fetch >MasterReq_DB< field
Please help me ...
Thanks in advance

My python code is...

import sys 
import array
import string
from xml.dom.minidom import parse,parseString
from xml.dom import minidom                                              
input_file = open("test_input.txt",'r')                                                
alines = input_file.read()
word_lst = alines.split("'")
filename = word_lst[1]
pathname=word_lst[3]                                               
f = open(pathname,'r')
doc = minidom.parse(f)
node = doc.documentElement
events = doc.getElementsByTagName('event')
for event in events:
    #print (event)
    columns =  event.getElementsByTagName('col')
    for column in columns:
        #print (column)
        head = column.getAttribute('name')
        if (head == ('Frame Name')):
           print (head)
           request = head.firstChild.wholeText
           print (request)
print ("DOne")
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Rohit
  • 31
  • 4
  • What code have you tried? Have you looked at [elementtree](http://docs.python.org/py3k/library/xml.etree.elementtree.html) and [lxml](http://lxml.de/) (the latter being a more powerful extension overlapping in functionality with the former). – Martijn Pieters Jun 16 '12 at 09:14
  • please see my python code above... – Rohit Jun 16 '12 at 09:31
  • And `print (request)` outputs what exactly? Have you tried `print (repr(request))`? I'd strongly advise switching to `elementtree` as a vastly superior XML API for python. – Martijn Pieters Jun 16 '12 at 09:36
  • i get error as: Frame Name Traceback (most recent call last): File "C:\Users\rshirurm\Desktop\AD7180_aut\AD7180_auto.py", line 25, in request = head.firstChild.wholeText AttributeError: 'str' object has no attribute 'firstChild' – Rohit Jun 16 '12 at 09:51
  • 1
    There is your hint: `head` is a string (the value of the column attribute).. use `column.firstChild` perhaps? :-P – Martijn Pieters Jun 16 '12 at 09:54
  • Thanks a lot Martijn Pieters ... It worked :) :) – Rohit Jun 16 '12 at 09:57

1 Answers1

1

Here's a primer to get you started with lxml if you wish to:

In [1]: x = '''<event timestamp="0.447463" bustype="LIN" channel="LIN 1">  
   ...:  <col name="Time"/>  
   ...:  <col name="Start of Frame">0.440708</col>  
   ...:  <col name="Channel">LIN 1</col>  
   ...:  <col name="Dir">Tx</col>  
   ...:  <col name="Event Type">LIN Frame (Diagnostic Request)</col>  
   ...:  <col name="Frame Name">MasterReq_DB</col>  
   ...:  <col name="Id">3C</col>  
   ...:  <col name="Data">81 06 04 04 FF FF 50 4C</col>  
   ...:  <col name="Publisher">TestMaster (simulated)</col>  
   ...:  <col name="Checksum">D3 &quot;Classic&quot;</col>  
   ...:  <col name="Header Duration">2.090 ms (40.1 bits)</col>  
   ...:  <col name="Resp. Duration">4.688 ms (90.0 bits)</col>  
   ...:  <col name="Time difference">0.049987</col>  
   ...:  <empty/>  
   ...: </event> '''

In [2]: from lxml import etree

In [3]: tree = etree.fromstring(x)

In [4]: [elem.text for elem in tree.xpath('//*[@name]')]
Out[4]: 
[None,
 '0.440708',
 'LIN 1',
 'Tx',
 'LIN Frame (Diagnostic Request)',
 'MasterReq_DB',
 '3C',
 '81 06 04 04 FF FF 50 4C',
 'TestMaster (simulated)',
 'D3 "Classic"',
 '2.090 ms (40.1 bits)',
 '4.688 ms (90.0 bits)',
 '0.049987']

In [5]: [name for name in tree.xpath('//@name')]
Out[5]: 
['Time',
 'Start of Frame',
 'Channel',
 'Dir',
 'Event Type',
 'Frame Name',
 'Id',
 'Data',
 'Publisher',
 'Checksum',
 'Header Duration',
 'Resp. Duration',
 'Time difference']

To read from file instead of a string, use lxml.etree.parse function.

Here's a link to lxml tutorial. This one is a reference for XPath syntax.

Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
  • hey,what do you suggest me ? use lxml or DOM bcos this is just start of my work and i need parse xml files which are in Mbytes... – Rohit Jun 16 '12 at 10:16
  • I haven't got any experience with DOM at all, to be honest. `lxml` is quite good for parsing. For parsing files of several Gb in size I use the [`iterparse`](http://lxml.de/parsing.html#iterparse-and-iterwalk) method of `lxml`, works great. For smaller files something like the example in my answer is what I normally do. – Lev Levitsky Jun 16 '12 at 10:26
  • thanks for suggetion...How can i write output to excel2007...? – Rohit Jun 16 '12 at 10:29
  • @Rohit Take a look at [this question](http://stackoverflow.com/questions/4257771/python-writing-to-excel-2007-files-xlsx-files). – Lev Levitsky Jun 16 '12 at 10:32