3

What would be a "pythonic" way of expressing regex'es with Python 3.10's (and later) match-case statements (i.e, pattern matching)? It intuitively sounds like a good use case, but I cannot figure out a "clean" way of expressing something like:

s = "ham@spam.com"
match s:
    # cases matches s using regex
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
gustafbstrom
  • 1,622
  • 4
  • 25
  • 44
  • 2
    IIRC the match is not that close conceptually. Matching is concerned with structure and types -all *you* have is a string -with a constrained format to boot. Now you can hit a case (based on structure types) and use a regex as a guard to confirm the case. But that’s not fundamentally give that much weight to the regex. Or maybe you regex split or do named groups and then feed results to matching but thats *before* matching itself. – JL Peyret Dec 07 '22 at 17:34
  • @JLPeyret No, my conclusion is along that line as well. I just hoped that there is some opportunity missing to me, as pattern matching using regex would be such an obviously straightforward use case. – gustafbstrom Dec 07 '22 at 19:28
  • 1
    I had the same idea. For me it would make sense to also use such a dynamic matching. But this is not how pattern matching in Python work. Python's pattern matching looks only at types. A condition can be used, but its purpose is only to provide a yes or no decision - nothing more. What you would need is the result of the condition bound to a match variable. And that is not possible. Maybe the Python team invents something in the future. I use the `:=` operator instead. – habrewning Jul 16 '23 at 20:27

2 Answers2

1

The following works

examples = ["john.doe@acme.com", "https://www.web-site.com",  "Doe, John", "junk", r"\\computer\folder\file", "smtp://mailserver.acme.com"]

for text in examples:

    match tuple(
            i for i, e in 
                enumerate(
                    [   
                        r"[A-Za-z0-9\.\-]+@[A-Za-z0-9\.\-]+$", # e-mail address pattern => 0 
                        r"http[s]{0,1}://[A-Za-z0-9\/\.\-]*$", # url pattern => 1 
                        r"[A-Za-z]+, [A-Za-z]+$" # name pattern => 2
                    ]
                ) 
                if re.match(e, text) is not None
            ):
        
        case (0,): print(f"'{text}' is an e-mail address")

        case (1,): print(f"'{text}' is a url")
        
        case (2,): print(f"'{text}' is a name")

        case (0, 1) | (0, 2) | (1,2) | (0, 1, 2): print(f"'{text}' is impossible, because the patterns are mutually exclusive")
        
        case _ : print(f"'{text}' is unrecognizable")

and results in

'john.doe@acme.com' is an e-mail address
'https://www.web-site.com' is a url
'Doe, John' is a name
'junk' is unrecognizable
'\\computer\folder\file' is unrecognizable
'smtp://mailserver.acme.com' is unrecognizable
racemaniac
  • 36
  • 2
0

I like to use the walrus operator and match against one of the Regex Match() object's methods.

m.groupdict(), m.groups(), or m.regs

if m := re.match(r"(?P<mailbox>.+?)@(?P<domain>.+)", "mailbox@example.com"):
    match m.groupdict():
        case {'domain': "gmail.com"}:
            print(f"A google email address was found")
        case {'domain': domain}:
            print(f"domain was bound to {repr(domain)}")
else:
    pass # No match

It would also be possible to match against m.groups() to compare with

Greg klupar
  • 39
  • 1
  • 6