0

I want to use PEP 634 – Structural Pattern Matching to match an HtmlElement that has a particular attribute. The attributes are accessible through an .attrib attribute that returns an instance of the _Attrib class, and IIUC it has all methods for it to be a collections.abc.Mapping.

The PEP says this:

For a mapping pattern to succeed the subject must be a mapping, where being a mapping is defined as its class being one of the following:

  • a class that inherits from collections.abc.Mapping
  • a Python class that has been registered as a collections.abc.Mapping
  • ...

Here's what I'm trying to do, but it doesn't print the href:

from collections.abc import Mapping
from lxml.html import HtmlElement, fromstring
el = fromstring('<a href="https://stackoverflow.com/">StackOverflow</a>')
Mapping.register(type(el.attrib))  # lxml.etree._Attrib
assert(isinstance(el.attrib, Mapping))  # It's True even before registering _Attrib.

match el:
    case HtmlElement(tag='a', attrib={'href': href}):
        print(href)

This matches and prints attrib:

match el:
    case HtmlElement(tag='a', attrib=Mapping() as attrib):
        print(attrib)

This does not match, as expected:

match el:
    case HtmlElement(tag='a', attrib=list() as attrib):
        print(attrib)

I also tried this and it works:

class Upperer:
    def __getitem__(self, key): return key.upper()
    def __len__(self): return 1
    def get(self, key, default): return self[key]
Mapping.register(Upperer)  # It doesn't work without this line.
match Upperer():
    case {'href': href}:
        print(href)  # Prints "HREF"

I understand using XPath/CSS selectors would be easier, but at this point I just want to know what is the problem with the _Attrib class and my code.

Also, I don't want to unpack the element and convert the _Attrib instance to dict as follows:

match el.tag, dict(el.attrib):
    case 'a', {'href': href}:
        print(href)

or use guards:

match el:
    case HtmlElement(tag='a', attrib=attrs) if 'href' in attrs:
        print(attrs['href'])

It works but it doesn't look right. I'd like to find a solution so the original case HtmlElement(tag='a', attrib={'href': href}) works. Or something that's very close to it.

Python version I'm using is 3.11.4.

qyryq
  • 170
  • 2
  • 6

0 Answers0