look at this example:
# xml parser
bs4.BeautifulSoup('<price>£4</price>', 'xml')
# prints:
<?xml version="1.0" encoding="utf-8"?>
<price>4</price>
# html (lxml) parser
bs4.BeautifulSoup('<span>£4</span>', 'lxml')
# prints:
<html><body><span>£4</span></body></html>
Notice the £
sign. Why the XML parser removes it? What should I do to have it in the output? I need xml
parsing, because the document contains some paired tags which are wrongly parsed by lxml
parser (e.g. <link>
).