I want to use PEP 634 – Structural Pattern Matching to match an HtmlElement
that has a particular attribute. The attributes are accessible through an .attrib
attribute that returns an instance of the _Attrib
class, and IIUC it has all methods for it to be a collections.abc.Mapping
.
The PEP says this:
For a mapping pattern to succeed the subject must be a mapping, where being a mapping is defined as its class being one of the following:
- a class that inherits from
collections.abc.Mapping
- a Python class that has been registered as a
collections.abc.Mapping
- ...
Here's what I'm trying to do, but it doesn't print the href
:
from collections.abc import Mapping
from lxml.html import HtmlElement, fromstring
el = fromstring('<a href="https://stackoverflow.com/">StackOverflow</a>')
Mapping.register(type(el.attrib)) # lxml.etree._Attrib
assert(isinstance(el.attrib, Mapping)) # It's True even before registering _Attrib.
match el:
case HtmlElement(tag='a', attrib={'href': href}):
print(href)
This matches and prints attrib
:
match el:
case HtmlElement(tag='a', attrib=Mapping() as attrib):
print(attrib)
This does not match, as expected:
match el:
case HtmlElement(tag='a', attrib=list() as attrib):
print(attrib)
I also tried this and it works:
class Upperer:
def __getitem__(self, key): return key.upper()
def __len__(self): return 1
def get(self, key, default): return self[key]
Mapping.register(Upperer) # It doesn't work without this line.
match Upperer():
case {'href': href}:
print(href) # Prints "HREF"
I understand using XPath/CSS selectors would be easier, but at this point I just want to know what is the problem with the _Attrib
class and my code.
Also, I don't want to unpack the element and convert the _Attrib
instance to dict as follows:
match el.tag, dict(el.attrib):
case 'a', {'href': href}:
print(href)
or use guards:
match el:
case HtmlElement(tag='a', attrib=attrs) if 'href' in attrs:
print(attrs['href'])
It works but it doesn't look right. I'd like to find a solution so the original case HtmlElement(tag='a', attrib={'href': href})
works. Or something that's very close to it.
Python version I'm using is 3.11.4.