Questions tagged [lxml]

lxml is a full-featured, high performance Python library for processing XML and HTML.

Questions that concern the lxml Python library should have this tag. Per the XML website, "The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt." The library's lxml.etree package is used for XML processing. lxml's BeautifulSoup package parses broken HTML. html5lib uses the HTML5 parsing algorithm.

Links:

https://lxml.de/ - Contains API documentation and tutorials

https://www.ibm.com/developerworks/xml/library/x-hiperfparse/ - IBM developerWorks page on lxml

5412 questions
2
votes
1 answer

XPath Child Traversal Methods and Performance

I'm using lxml on Python 2.7. Given a node, node and a child, child_element, what is the difference between these: node.xpath('./child_element') node.xpath("*[local-name()='child_element']") In other words, what's going on under the hood here? Is…
AutomaticStatic
  • 1,661
  • 3
  • 21
  • 42
2
votes
1 answer

Can you modify only a text string in an XML file and still maintain integrity and functionality of .docx encasement?

I want to enter data into a Microsoft Excel Spreadsheet, and for that data to interact and write itself to other documents and webforms. With success, I am pulling data from an Excel spreadsheet using xlwings. Right now, I’m stuck working with…
Murcielago
  • 1,030
  • 1
  • 14
  • 24
2
votes
1 answer

Python 3.4 : How to do xml validation

I'm trying to do XML validation against some XSD in python. I was successful using lxml package. But the problem starts when I tried to port my code into python 3.4. I tried to install lxml for 3.4 version. Looks like my enterprise linux doesn't…
Satish Jonnala
  • 619
  • 3
  • 9
  • 21
2
votes
2 answers

Python lxml - using the xml:lang attribute to retrieve an element

I have some xml which has multiple elements with the same name, but each is in a different language, for example: Les Tudors Die Tudors <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/python" class="post-tag grid--cell" title="show questions tagged 'python'" rel="tag">python</a> <a href="../../questions/tagged/xml" class="post-tag grid--cell" title="show questions tagged 'xml'" rel="tag">xml</a> <a href="../../questions/tagged/lxml" class="post-tag grid--cell" title="show questions tagged 'lxml'" rel="tag">lxml</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Jul 06 '15 at 16:10">asked Jul 06 '15 at 16:10</time> <a href="../../users/2829269/nick" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/2829269.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Nick" /> </a> <div class="s-user-card--info"> <a href="../../users/2829269/nick" class="s-user-card--link">Nick</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">141</li> <li class="s-award-bling s-award-bling__bronze" title="11 bronze badges">11</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-31246668"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>2</strong></span> <div class="viewcount">votes</div> </div> </div> <div class="status answered-accepted"> <strong>1</strong> answer </div> </div> </div> <div class="summary"> <h3><a href="../../questions/31246668/css-selectors-to-query-by-attribute-alone-with-lxml" class="question-hyperlink">CSS selectors to query by attribute alone, with LXML</a></h3> <div class="excerpt">I want to get tags where the attribute contains a {% like these examples: <a href="{% route xy %}></a> <img src="{% static xy %}/> The attribute key does not matter. The best I can come up with is tag.cssselect[href*={%] but it matches only hrefs…</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/python" class="post-tag grid--cell" title="show questions tagged 'python'" rel="tag">python</a> <a href="../../questions/tagged/css" class="post-tag grid--cell" title="show questions tagged 'css'" rel="tag">css</a> <a href="../../questions/tagged/lxml" class="post-tag grid--cell" title="show questions tagged 'lxml'" rel="tag">lxml</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Jul 06 '15 at 13:04">asked Jul 06 '15 at 13:04</time> <a href="../../users/604511/jesvin-jose" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/604511.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Jesvin Jose" /> </a> <div class="s-user-card--info"> <a href="../../users/604511/jesvin-jose" class="s-user-card--link">Jesvin Jose</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">22,498</li> <li class="s-award-bling s-award-bling__gold" title="32 gold badges">32</li> <li class="s-award-bling s-award-bling__silver" title="109 silver badges">109</li> <li class="s-award-bling s-award-bling__bronze" title="202 bronze badges">202</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-31192887"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>2</strong></span> <div class="viewcount">votes</div> </div> </div> <div class="status answered-accepted"> <strong>3</strong> answers </div> </div> </div> <div class="summary"> <h3><a href="../../questions/31192887/parsing-xml-with-python-accessing-elements" class="question-hyperlink">Parsing XML with Python - accessing elements</a></h3> <div class="excerpt">I'm using lxml to parse some xml, but for some reason I can't find a specific element. I'm trying to access the <Constant> elements. Here's an xml snippet: </rdf:Description> </rdf:RDF> </MiriamAnnotation> <ListOfSubstrates> …</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/python" class="post-tag grid--cell" title="show questions tagged 'python'" rel="tag">python</a> <a href="../../questions/tagged/xml" class="post-tag grid--cell" title="show questions tagged 'xml'" rel="tag">xml</a> <a href="../../questions/tagged/python-2.7" class="post-tag grid--cell" title="show questions tagged 'python-2.7'" rel="tag">python-2.7</a> <a href="../../questions/tagged/lxml" class="post-tag grid--cell" title="show questions tagged 'lxml'" rel="tag">lxml</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Jul 02 '15 at 19:18">asked Jul 02 '15 at 19:18</time> <a href="../../users/3062625/charon" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/3062625.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Charon" /> </a> <div class="s-user-card--info"> <a href="../../users/3062625/charon" class="s-user-card--link">Charon</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">2,344</li> <li class="s-award-bling s-award-bling__gold" title="6 gold badges">6</li> <li class="s-award-bling s-award-bling__silver" title="25 silver badges">25</li> <li class="s-award-bling s-award-bling__bronze" title="44 bronze badges">44</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-31126831"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>2</strong></span> <div class="viewcount">votes</div> </div> </div> <div class="status answered-accepted"> <strong>1</strong> answer </div> </div> </div> <div class="summary"> <h3><a href="../../questions/31126831/beautifulsoup-with-xml-fails-to-parse-full-unicode-strings" class="question-hyperlink">BeautifulSoup with XML fails to parse full unicode strings</a></h3> <div class="excerpt">Edited. I am using BeautifulSoup with lxml to parse XML documents from an external source. Bizarrely, on some documents, the parser appears to give up in the middle of the text and cut the document short. I have whittled this down to a precise test…</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/python" class="post-tag grid--cell" title="show questions tagged 'python'" rel="tag">python</a> <a href="../../questions/tagged/beautifulsoup" class="post-tag grid--cell" title="show questions tagged 'beautifulsoup'" rel="tag">beautifulsoup</a> <a href="../../questions/tagged/lxml" class="post-tag grid--cell" title="show questions tagged 'lxml'" rel="tag">lxml</a> <a href="../../questions/tagged/python-unicode" class="post-tag grid--cell" title="show questions tagged 'python-unicode'" rel="tag">python-unicode</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Jun 29 '15 at 22:46">asked Jun 29 '15 at 22:46</time> <a href="../../users/5046444/cmuk" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/5046444.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="cmuk" /> </a> <div class="s-user-card--info"> <a href="../../users/5046444/cmuk" class="s-user-card--link">cmuk</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">473</li> <li class="s-award-bling s-award-bling__silver" title="5 silver badges">5</li> <li class="s-award-bling s-award-bling__bronze" title="9 bronze badges">9</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-31030382"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>2</strong></span> <div class="viewcount">votes</div> </div> </div> <div class="status answered-accepted"> <strong>1</strong> answer </div> </div> </div> <div class="summary"> <h3><a href="../../questions/31030382/lxml-etree-get-all-text-before-element" class="question-hyperlink">lxml etree get all text before element</a></h3> <div class="excerpt">How to get all text before an element in a etree separated from the text after the element? from lxml import etree tree = etree.fromstring(''' <a> find <b> the </b> text <dd></dd> <c> …</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/python" class="post-tag grid--cell" title="show questions tagged 'python'" rel="tag">python</a> <a href="../../questions/tagged/xml" class="post-tag grid--cell" title="show questions tagged 'xml'" rel="tag">xml</a> <a href="../../questions/tagged/xml-parsing" class="post-tag grid--cell" title="show questions tagged 'xml-parsing'" rel="tag">xml-parsing</a> <a href="../../questions/tagged/lxml" class="post-tag grid--cell" title="show questions tagged 'lxml'" rel="tag">lxml</a> <a href="../../questions/tagged/elementtree" class="post-tag grid--cell" title="show questions tagged 'elementtree'" rel="tag">elementtree</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Jun 24 '15 at 15:14">asked Jun 24 '15 at 15:14</time> <a href="../../users/964891/milla-well" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/964891.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Milla Well" /> </a> <div class="s-user-card--info"> <a href="../../users/964891/milla-well" class="s-user-card--link">Milla Well</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">3,193</li> <li class="s-award-bling s-award-bling__gold" title="3 gold badges">3</li> <li class="s-award-bling s-award-bling__silver" title="35 silver badges">35</li> <li class="s-award-bling s-award-bling__bronze" title="50 bronze badges">50</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-30999235"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>2</strong></span> <div class="viewcount">votes</div> </div> </div> <div class="status answered-accepted"> <strong>1</strong> answer </div> </div> </div> <div class="summary"> <h3><a href="../../questions/30999235/python-2-7-etree-lxml-minimizing" class="question-hyperlink">Python 2.7 Etree/lxml minimizing</a></h3> <div class="excerpt">Im using lxml/Etree to parse and write to XSD documents. I have the basic structure tree = ET.parse('file.xsd') # do stuff tree.write('output.xsd') But tags get minimized in some instances, for example: <Cars> <Car…</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/python" class="post-tag grid--cell" title="show questions tagged 'python'" rel="tag">python</a> <a href="../../questions/tagged/xsd" class="post-tag grid--cell" title="show questions tagged 'xsd'" rel="tag">xsd</a> <a href="../../questions/tagged/formatting" class="post-tag grid--cell" title="show questions tagged 'formatting'" rel="tag">formatting</a> <a href="../../questions/tagged/lxml" class="post-tag grid--cell" title="show questions tagged 'lxml'" rel="tag">lxml</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Jun 23 '15 at 09:40">asked Jun 23 '15 at 09:40</time> <a href="../../users/1230911/enrm" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/1230911.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="enrm" /> </a> <div class="s-user-card--info"> <a href="../../users/1230911/enrm" class="s-user-card--link">enrm</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">645</li> <li class="s-award-bling s-award-bling__gold" title="1 gold badge">1</li> <li class="s-award-bling s-award-bling__silver" title="8 silver badge">8</li> <li class="s-award-bling s-award-bling__bronze" title="22 bronze badge">22</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-30985658"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>2</strong></span> <div class="viewcount">votes</div> </div> </div> <div class="status "> <strong>2</strong> answers </div> </div> </div> <div class="summary"> <h3><a href="../../questions/30985658/identifying-branches-different-in-tag-structure" class="question-hyperlink">Identifying branches different in tag structure</a></h3> <div class="excerpt">I'm hoping to check if two html are different by tags only without considering the text and pick out those branch(es). For example : html_1 = """ <p>i love it</p> """ html_2 = """ <p>i love it really</p> """ They share the same tag structure, so…</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/python" class="post-tag grid--cell" title="show questions tagged 'python'" rel="tag">python</a> <a href="../../questions/tagged/parsing" class="post-tag grid--cell" title="show questions tagged 'parsing'" rel="tag">parsing</a> <a href="../../questions/tagged/dom" class="post-tag grid--cell" title="show questions tagged 'dom'" rel="tag">dom</a> <a href="../../questions/tagged/beautifulsoup" class="post-tag grid--cell" title="show questions tagged 'beautifulsoup'" rel="tag">beautifulsoup</a> <a href="../../questions/tagged/lxml" class="post-tag grid--cell" title="show questions tagged 'lxml'" rel="tag">lxml</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Jun 22 '15 at 16:56">asked Jun 22 '15 at 16:56</time> <a href="../../users/233798/kar" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/233798.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Kar" /> </a> <div class="s-user-card--info"> <a href="../../users/233798/kar" class="s-user-card--link">Kar</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">6,063</li> <li class="s-award-bling s-award-bling__gold" title="7 gold badges">7</li> <li class="s-award-bling s-award-bling__silver" title="53 silver badges">53</li> <li class="s-award-bling s-award-bling__bronze" title="82 bronze badges">82</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-30836928"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>2</strong></span> <div class="viewcount">votes</div> </div> </div> <div class="status answered-accepted"> <strong>3</strong> answers </div> </div> </div> <div class="summary"> <h3><a href="../../questions/30836928/combine-multiple-tags-with-lxml" class="question-hyperlink">Combine multiple tags with lxml</a></h3> <div class="excerpt">I have an html file which looks like: ... <p> <strong>This is </strong> <strong>a lin</strong> <strong>e which I want to </strong> <strong>join.</strong> </p> <p> 2. <strong>But do not </strong> <strong>touch…</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/python" class="post-tag grid--cell" title="show questions tagged 'python'" rel="tag">python</a> <a href="../../questions/tagged/html" class="post-tag grid--cell" title="show questions tagged 'html'" rel="tag">html</a> <a href="../../questions/tagged/xpath" class="post-tag grid--cell" title="show questions tagged 'xpath'" rel="tag">xpath</a> <a href="../../questions/tagged/lxml" class="post-tag grid--cell" title="show questions tagged 'lxml'" rel="tag">lxml</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Jun 15 '15 at 03:25">asked Jun 15 '15 at 03:25</time> <a href="../../users/4586008/lpounng" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/4586008.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="lpounng" /> </a> <div class="s-user-card--info"> <a href="../../users/4586008/lpounng" class="s-user-card--link">lpounng</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">570</li> <li class="s-award-bling s-award-bling__silver" title="6 silver badges">6</li> <li class="s-award-bling s-award-bling__bronze" title="27 bronze badges">27</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-30829636"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>2</strong></span> <div class="viewcount">votes</div> </div> </div> <div class="status answered-accepted"> <strong>1</strong> answer </div> </div> </div> <div class="summary"> <h3><a href="../../questions/30829636/lxml-parse-td-content-within-tr-tag" class="question-hyperlink">LXML - parse td content within tr tag</a></h3> <div class="excerpt">I want to parse each individual statistic from the yahoo finance tables for formatting purposes - when parsing the entire table the formatting is terrible!! I am currently using the code below and I would have to repeat the 4 lines of contentA code…</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/python" class="post-tag grid--cell" title="show questions tagged 'python'" rel="tag">python</a> <a href="../../questions/tagged/html" class="post-tag grid--cell" title="show questions tagged 'html'" rel="tag">html</a> <a href="../../questions/tagged/python-3.x" class="post-tag grid--cell" title="show questions tagged 'python-3.x'" rel="tag">python-3.x</a> <a href="../../questions/tagged/lxml" class="post-tag grid--cell" title="show questions tagged 'lxml'" rel="tag">lxml</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Jun 14 '15 at 12:56">asked Jun 14 '15 at 12:56</time> <a href="../../users/4857686/aran-freel" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/4857686.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Aran Freel" /> </a> <div class="s-user-card--info"> <a href="../../users/4857686/aran-freel" class="s-user-card--link">Aran Freel</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">3,085</li> <li class="s-award-bling s-award-bling__gold" title="5 gold badges">5</li> <li class="s-award-bling s-award-bling__silver" title="29 silver badges">29</li> <li class="s-award-bling s-award-bling__bronze" title="42 bronze badges">42</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-30656861"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>2</strong></span> <div class="viewcount">votes</div> </div> </div> <div class="status answered-accepted"> <strong>1</strong> answer </div> </div> </div> <div class="summary"> <h3><a href="../../questions/30656861/lxml-to-write-in-unicode" class="question-hyperlink">LXML to write in unicode?</a></h3> <div class="excerpt">I am currently using lxml to write a file. I build the node and then I write it to a file using etree.tostring(node, pretty_print=True). However, it seems to be using htmlencoding -- <Synopsis> Abila schließlich die ersten sechs Aufgaben zu…</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/python" class="post-tag grid--cell" title="show questions tagged 'python'" rel="tag">python</a> <a href="../../questions/tagged/unicode" class="post-tag grid--cell" title="show questions tagged 'unicode'" rel="tag">unicode</a> <a href="../../questions/tagged/lxml" class="post-tag grid--cell" title="show questions tagged 'lxml'" rel="tag">lxml</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked Jun 05 '15 at 01:21">asked Jun 05 '15 at 01:21</time> <a href="../../users/651174/david542" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/651174.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="David542" /> </a> <div class="s-user-card--info"> <a href="../../users/651174/david542" class="s-user-card--link">David542</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">104,438</li> <li class="s-award-bling s-award-bling__gold" title="178 gold badges">178</li> <li class="s-award-bling s-award-bling__silver" title="489 silver badges">489</li> <li class="s-award-bling s-award-bling__bronze" title="842 bronze badges">842</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-30163243"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>2</strong></span> <div class="viewcount">votes</div> </div> </div> <div class="status answered-accepted"> <strong>1</strong> answer </div> </div> </div> <div class="summary"> <h3><a href="../../questions/30163243/how-to-parse-this-xml-response-in-python" class="question-hyperlink">How to parse this XML response in Python?</a></h3> <div class="excerpt">This is my XML file: <?xml version="1.0" ?> <Items> <Item> <ASIN>3570102769</ASIN> …</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/python" class="post-tag grid--cell" title="show questions tagged 'python'" rel="tag">python</a> <a href="../../questions/tagged/xml" class="post-tag grid--cell" title="show questions tagged 'xml'" rel="tag">xml</a> <a href="../../questions/tagged/parsing" class="post-tag grid--cell" title="show questions tagged 'parsing'" rel="tag">parsing</a> <a href="../../questions/tagged/xpath" class="post-tag grid--cell" title="show questions tagged 'xpath'" rel="tag">xpath</a> <a href="../../questions/tagged/lxml" class="post-tag grid--cell" title="show questions tagged 'lxml'" rel="tag">lxml</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked May 11 '15 at 08:49">asked May 11 '15 at 08:49</time> <a href="../../users/4583757/julian-baehr" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/4583757.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Julian Baehr" /> </a> <div class="s-user-card--info"> <a href="../../users/4583757/julian-baehr" class="s-user-card--link">Julian Baehr</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">27</li> <li class="s-award-bling s-award-bling__bronze" title="4 bronze badges">4</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="mln24"> <div class="question-summary" id="question-summary-30143394"> <div class="statscontainer"> <div class="stats"> <div class="vote"> <div class="votes"> <span class="vote-count-post"><strong>2</strong></span> <div class="viewcount">votes</div> </div> </div> <div class="status "> <strong>1</strong> answer </div> </div> </div> <div class="summary"> <h3><a href="../../questions/30143394/lxml-html-ignoring-body-class-attributes" class="question-hyperlink">lxml.html ignoring body class attributes</a></h3> <div class="excerpt">I am using lxml.html for parsing html content. But I don't understand why lxml is dropping "body" tag attributes. Tried using both lxml.html.parse and lxml.html.document_fromstring as suggested here But still it is not working. Example html…</div> <div class="grid ai-start jc-space-between fw-wrap"> <div class="grid gs4 fw-wrap tags "> <a href="../../questions/tagged/iframe" class="post-tag grid--cell" title="show questions tagged 'iframe'" rel="tag">iframe</a> <a href="../../questions/tagged/html-parsing" class="post-tag grid--cell" title="show questions tagged 'html-parsing'" rel="tag">html-parsing</a> <a href="../../questions/tagged/lxml" class="post-tag grid--cell" title="show questions tagged 'lxml'" rel="tag">lxml</a> <a href="../../questions/tagged/lxml.html" class="post-tag grid--cell" title="show questions tagged 'lxml.html'" rel="tag">lxml.html</a> </div> <div class="started mt0"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="asked May 09 '15 at 18:09">asked May 09 '15 at 18:09</time> <a href="../../users/4882643/karan" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/4882643.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Karan" /> </a> <div class="s-user-card--info"> <a href="../../users/4882643/karan" class="s-user-card--link">Karan</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">46</li> <li class="s-award-bling s-award-bling__bronze" title="3 bronze badges">3</li> </ul> </div> </div> </div> </div> </div> </div> </div> <div class="s-pagination pager fr"> <a class="s-pagination--item" href="../../questions/tagged/lxml_page=93" rel="prev" title="Go to page 93">Prev </a> <a class="s-pagination--item" href="../../questions/tagged/lxml_page=1" rel="" title="Go to page 1">1</a> <a class="s-pagination--item" href="../../questions/tagged/lxml_page=2" rel="" title="Go to page 2">2</a> <a class="s-pagination--item" href="../../questions/tagged/lxml_page=3" rel="" title="Go to page 3">3</a> <div class="s-pagination--item s-pagination--item__clear">…</div> <a class="s-pagination--item" href="../../questions/tagged/lxml_page=99" rel="" title="Go to page 99">99</a> <a class="s-pagination--item" href="../../questions/tagged/lxml_page=100" rel="" title="Go to page 100">100</a> <a class="s-pagination--item" href="../../questions/tagged/lxml_page=95" rel="next" title="Go to page 95"> Next</a> </div> </div> </div> </div> </div> <script src="../../static/js/stack-icons.js"></script> <script src="../../static/js/fromnow.js"></script> </body> </html>