In my xml I have a CDATA
section. I want to keep the CDATA part, and then strip it. Can someone help with the following?
Default does not work:
$ from io import StringIO
$ from lxml import etree
$ xml = '<Subject> My Subject: 美海軍研究船勘查台海水文? 船<![CDATA[é]]>€ </Subject>'
$ tree = etree.parse(StringIO(xml))
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文? 船é€ '
This post seems to suggest that a parser
option strip_cdata=False
may keep the cdata, but it has no effect:
$ parser=etree.XMLParser(strip_cdata=False)
$ tree = etree.parse(StringIO(xml), parser=parser)
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文? 船é€ '
Using strip_cdata=True
, which should be the default, yields the same:
$ parser=etree.XMLParser(strip_cdata=True)
$ tree = etree.parse(StringIO(xml), parser=parser)
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文? 船é€ '