How to use structural pattern matching (match-case) with regex?

Question

What would be a "pythonic" way of expressing regex'es with Python 3.10's (and later) match-case statements (i.e, pattern matching)? It intuitively sounds like a good use case, but I cannot figure out a "clean" way of expressing something like:

s = "ham@spam.com"
match s:
    # cases matches s using regex

IIRC the match is not that close conceptually. Matching is concerned with structure and types -all *you* have is a string -with a constrained format to boot. Now you can hit a case (based on structure types) and use a regex as a guard to confirm the case. But that’s not fundamentally give that much weight to the regex. Or maybe you regex split or do named groups and then feed results to matching but thats *before* matching itself. — JL Peyret, Dec 07 '22 at 17:34
@JLPeyret No, my conclusion is along that line as well. I just hoped that there is some opportunity missing to me, as pattern matching using regex would be such an obviously straightforward use case. — gustafbstrom, Dec 07 '22 at 19:28
I had the same idea. For me it would make sense to also use such a dynamic matching. But this is not how pattern matching in Python work. Python's pattern matching looks only at types. A condition can be used, but its purpose is only to provide a yes or no decision - nothing more. What you would need is the result of the condition bound to a match variable. And that is not possible. Maybe the Python team invents something in the future. I use the `:=` operator instead. — habrewning, Jul 16 '23 at 20:27

score 1 · Answer 1 · answered Jul 29 '23 at 16:24

The following works

examples = ["john.doe@acme.com", "https://www.web-site.com",  "Doe, John", "junk", r"\\computer\folder\file", "smtp://mailserver.acme.com"]

for text in examples:

    match tuple(
            i for i, e in 
                enumerate(
                    [   
                        r"[A-Za-z0-9\.\-]+@[A-Za-z0-9\.\-]+$", # e-mail address pattern => 0 
                        r"http[s]{0,1}://[A-Za-z0-9\/\.\-]*$", # url pattern => 1 
                        r"[A-Za-z]+, [A-Za-z]+$" # name pattern => 2
                    ]
                ) 
                if re.match(e, text) is not None
            ):
        
        case (0,): print(f"'{text}' is an e-mail address")

        case (1,): print(f"'{text}' is a url")
        
        case (2,): print(f"'{text}' is a name")

        case (0, 1) | (0, 2) | (1,2) | (0, 1, 2): print(f"'{text}' is impossible, because the patterns are mutually exclusive")
        
        case _ : print(f"'{text}' is unrecognizable")

and results in

'john.doe@acme.com' is an e-mail address
'https://www.web-site.com' is a url
'Doe, John' is a name
'junk' is unrecognizable
'\\computer\folder\file' is unrecognizable
'smtp://mailserver.acme.com' is unrecognizable

score 0 · Answer 2 · answered Jul 31 '23 at 00:43

I like to use the walrus operator and match against one of the Regex Match() object's methods.

m.groupdict(), m.groups(), or m.regs

if m := re.match(r"(?P<mailbox>.+?)@(?P<domain>.+)", "mailbox@example.com"):
    match m.groupdict():
        case {'domain': "gmail.com"}:
            print(f"A google email address was found")
        case {'domain': domain}:
            print(f"domain was bound to {repr(domain)}")
else:
    pass # No match

It would also be possible to match against m.groups() to compare with

How to use structural pattern matching (match-case) with regex?

2 Answers2