This is not an DFS algorithm question, or library-suggestion question. It is specifically about lxml.etree (v 4). I use python 3.9.
This library, lxml.etree
, provides a way to iterate over the ElmentTree into which an HTML code is parsed.
The iterator is DFS, but preordering (using the term from Wikipedia article on DFS). It means the elements are yielded in the order of first visit. My question is then what is the easy way to implement the postorder iteration.
Here is a minimal code demonstrating that the default order of iter()
is the pre-order. I created a dummy funciton so the second test obviously fails. I need an implementaiton of _iter_postorder
for the assertion to hold true.
import unittest
from typing import List
from xml.etree.ElementTree import ElementTree
from lxml import etree
HTML1 = """
<div class="c1">
<span class="c11">11</span>
<span class="c12">12</span>
</div>
"""
def _iter_postorder(tree: ElementTree) -> List[str]:
return []
class EtreeElementTests(unittest.TestCase):
def test_dfs_preordering(self):
""" regular iter() is dfs preordering"""
root = etree.HTML(HTML1, etree.XMLParser())
tree = ElementTree(root)
result = [el.attrib['class'] for el in tree.iter()]
self.assertListEqual(result, ["c1", "c11", "c12"])
def test_dfs_postordering(self):
root = etree.HTML(HTML1, etree.XMLParser())
tree = ElementTree(root)
result = _iter_postorder(tree)
self.assertListEqual(result, ["c11", "c12", "c1"])