Python Parse XML file for certain lines and output the line to Text widget

Question

I need to search a windows msinfo file (.nfo) for certain lines and print them to a Text widget. I can print(line) ever line in the file and I can output every line to the Text widget but as soon as I try to specify lines to output it stops working. I assume this is because the file is an XML but the XML parsing tools I see for python seem to look for lines like data=blah. The entries im looking for look like this when I open them in a txt editor:

    <Category name="Disks">
<Data>
<Item><![CDATA[Description]]></Item>
<Value><![CDATA[Disk drive]]></Value>
</Data>
<Data>
<Item><![CDATA[Manufacturer]]></Item>
<Value><![CDATA[(Standard disk drives)]]></Value>
</Data>
<Data>
<Item><![CDATA[Model]]></Item>
<Value><![CDATA[TOSHIB  MK1652GSX SCSI Disk Device]]></Value>
</Data>
<Data>
<Item><![CDATA[Bytes/Sector]]></Item>
<Value><![CDATA[512]]></Value>
</Data>
<Data>
<Item><![CDATA[Media Loaded]]></Item>
<Value><![CDATA[Yes]]></Value>
</Data>
<Data>
<Item><![CDATA[Media Type]]></Item>
<Value><![CDATA[Fixed hard disk]]></Value>
</Data>
<Data>
<Item><![CDATA[Partitions]]></Item>
<Value><![CDATA[2]]></Value>
</Data>
<Data>
<Item><![CDATA[SCSI Bus]]></Item>
<Value><![CDATA[1]]></Value>
</Data>
<Data>
<Item><![CDATA[SCSI Logical Unit]]></Item>
<Value><![CDATA[0]]></Value>
</Data>
<Data>
<Item><![CDATA[SCSI Port]]></Item>
<Value><![CDATA[0]]></Value>
</Data>
<Data>
<Item><![CDATA[SCSI Target ID]]></Item>
<Value><![CDATA[0]]></Value>
</Data>
<Data>
<Item><![CDATA[Sectors/Track]]></Item>
<Value><![CDATA[63]]></Value>
</Data>
<Data>
<Item><![CDATA[Size]]></Item>
<Value><![CDATA[149.05 GB (160,039,272,960 bytes)]]></Value>
</Data>
<Data>
<Item><![CDATA[Total Cylinders]]></Item>
<Value><![CDATA[19,457]]></Value>
</Data>
<Data>
<Item><![CDATA[Total Sectors]]></Item>
<Value><![CDATA[312,576,705]]></Value>
</Data>
<Data>
<Item><![CDATA[Total Tracks]]></Item>
<Value><![CDATA[4,961,535]]></Value>
</Data>
<Data>
<Item><![CDATA[Tracks/Cylinder]]></Item>
<Value><![CDATA[255]]></Value>
</Data>
<Data>
<Item><![CDATA[Partition]]></Item>
<Value><![CDATA[Disk #1, Partition #0]]></Value>
</Data>
<Data>
<Item><![CDATA[Partition Size]]></Item>
<Value><![CDATA[117.19 GB (125,830,301,184 bytes)]]></Value>
</Data>
<Data>
<Item><![CDATA[Partition Starting Offset]]></Item>
<Value><![CDATA[32,256 bytes]]></Value>
</Data>
<Data>
<Item><![CDATA[Partition]]></Item>
<Value><![CDATA[Disk #1, Partition #1]]></Value>
</Data>
<Data>
<Item><![CDATA[Partition Size]]></Item>
<Value><![CDATA[31.85 GB (34,200,714,240 bytes)]]></Value>
</Data>
<Data>
<Item><![CDATA[Partition Starting Offset]]></Item>
<Value><![CDATA[125,830,333,440 bytes]]></Value>
</Data>
<Data>

I found a post asking for what I want but the solution doesn't work. The ET.parse is not found:

import xml.etree as ET
file = 'D:\\MsInfo\\msinfo.nfo'
tree = ET.parse(file)
root = tree.getroot()

for element in root.findall('Category'):
    value = element.find('Data')
    for child in value:
        print(child.tag ,":",child.text)

When using the above I get this:

"C:\Program Files (x86)\Python35-32\python.exe" "D:/MY STUFF/Programming/Python/testing.py" Traceback (most recent call last): File "D:/MY STUFF/Programming/Python/testing.py", line 3, in tree = ET.parse(file) AttributeError: module 'xml.etree' has no attribute 'parse'

Process finished with exit code 1

This is a snippet from my code:

try:
    u = find("msinfo.nfo", s)
    for i in u:
        cpfotxt.insert('end', i + "\n")
        cpfotxt.yview(END)
        cpfotxt.insert('end', "================================= \n")
        with open(i, "r") as f:
            r = f.readlines()
            for line in r:
                if "Model" in line:
                    cpfotxt.insert('end', line + "\n")

If I remove the if "Model" in line: then it will dump everything into the Text widget fine.

This is how they look when opened normally with on windows:

Any advice on how to pull lines I need from an nfo/XML file?

Also, when printing lines from an xml the font is bigger and double spaced. How can I make the line print the same way it would from a normal txt file?

You should be able to use ElementTree... Can you elaborate on the code you were using when trying to parse it using ET.parse? — drez90, Aug 30 '16 at 19:22
the error you're getting is because you are importing the wrong thing. you need to do import xml.etree.ElementTree as ET — drez90, Aug 30 '16 at 19:39
@d_rez90 Woops. Okay so the import is fixed and no more parse error. Now I need to figure out how to point the parser to the Disk Category so I get the right item and value. — sidnical, Aug 30 '16 at 19:52
If you post a general watered down structure of the XML I can show you how to get the elements you need — drez90, Aug 30 '16 at 19:53
@d_rez90 I added a full section from the .nfo file to the original post. An example showing me how to pull one or two items and values from the Disks section will help me understand how to structure my queries. Its the same as if you pull the msinfo file from your own windows system. let me know if you need more than I pasted. I didn't want to add 3000 lines to the post. — sidnical, Aug 30 '16 at 20:12
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/122233/discussion-between-d-rez90-and-sidnical). — drez90, Aug 30 '16 at 20:18

drez90 · Accepted Answer · 2016-08-31T02:57:28.183

So you need to understand the structure of the XML and then use the actual tags you're looking for instead of 'Data'

    item = element.find('Item') 
    print(item.tag ,":",item.text)
    value = element.find('Value') 
    print(value.tag ,":",value.text)

Your actual problem is that you need to change the import you use.

import xml.etree.ElementTree as ET

https://docs.python.org/2/library/xml.etree.elementtree.html

Edit: with the way that's structured, you can get a list of Data elements by saying

for data in root.findall('Data'):
    item = data.find('Item') 
    print(item.tag ,":",item.text)
    value = data.find('Value') 
    print(value.tag ,":",value.text)

Now, understand that if that "Data" tag is not at the root level, then you need to root.find() until you can get to it. In other words, if those "Data" tags are enclosed in some parent tags, you need to root.find("Parent Tag"), hope you get the gist of it

Edit2: Looked at my own msinfo.nfo file and this worked:

disks = root.find(".//Category[@name='Disks']")

for disk in disks:
    item = disk.find('Item')
    print(item.tag ,":",item.text)
    value = disk.find('Value')
    print(value.tag ,":",value.text)

Note: This uses XPath syntax to find the element, which is only available in ElementTree1.3 (Python 2.7 and higher). You can also brute force it by following the structure of the XML and traversing through the tree until you get to Disks. The path was System Summary->Components->Storage->Disks and under Disks were those Data elements with Item and Value as children.

I didn't see your Edit2 until 10 minutes ago. I was able to use this to find everything I needed and output to my Text widget. Thank you for spending the extra time to help me understand how it works. — sidnical, Aug 31 '16 at 15:58

score 0 · Answer 2 · answered Aug 30 '16 at 19:35

Here is my code with your sample data, I know it could be written better but I think this solves your problem :)
you have to find the root(xml) and then iterate it's texts ! you can also use other methods like iterfind for better solutions

xml_file  = "<xml><Item><![CDATA[Model]]></Item><Value><![CDATA[TOSHIB  MK1652GSX SCSI Disk Device]]></Value></xml>"
from xml.etree import ElementTree
root = ElementTree.fromstring(xml_file)

start = root.itertext()

while True:
    try:
        print start.next()
    except StopIteration:
        break

Here is the output:

>>>Model
>>>TOSHIB  MK1652GSX SCSI Disk Device

Python Parse XML file for certain lines and output the line to Text widget

2 Answers2