I'm attempting to learn XML in order to parse GChats downloaded from GMail via IMAP. To do so I am using lxml. Each line of the chat messages is formatted like so:
<cli:message to="email@gmail.com" iconset="square" from="email@gmail.com" int:cid="insertid" int:sequence-no="1" int:time-stamp="1236608405935" xmlns:int="google:internal" xmlns:cli="jabber:client">
<cli:body>Nikko</cli:body>
<met:google-mail-signature xmlns:met="google:metadata">0c7ef6e618e9876b</met:google-mail- signature>
<x stamp="20090309T14:20:05" xmlns="jabber:x:delay"/>
<time ms="1236608405975" xmlns="google:timestamp"/>
</cli:message>
When I try to build the XML tree like so:
root = etree.Element("cli:message")
I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pyx", line 2568, in lxml.etree.Element (src/lxml/lxml.etree.c:52878)
File "apihelpers.pxi", line 126, in lxml.etree._makeElement (src/lxml/lxml.etree.c:11497)
File "apihelpers.pxi", line 1542, in lxml.etree._tagValidOrRaise (src/lxml/lxml.etree.c:23956)
ValueError: Invalid tag name u'cli:message'
When I try to escape it like so:
root = etree.Element("cli\:message")
I get the exact same error.
The header of the chats also gives this information, which seems relevant:
Content-Type: text/xml; charset=utf-8
Content-Transfer-Encoding: 7bit
Does anyone know what's going on here?