I find an unexpected result matching regular expresions using Spacy (version 3.1.3). I define a simple regex to identify a digit. Then I create strings made of a digit and a letter and try to identify then. Everything work as expected but with letters g, m and t:
Here is a minimal implementation
import string
from spacy.matcher import Matcher
from spacy.lang.en import English
nlp = English()
pattern = [{"TEXT": {"REGEX": "\d"}}]
matcher = Matcher(nlp.vocab)
matcher.add("usage",[pattern])
for l in string.ascii_lowercase:
doc = nlp(f"2{l}")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id]
span = doc[start:end]
print(l, span.text)
result
a 2a
b 2b
c 2c
d 2d
e 2e
f 2f
g 2 # EXPECTED 2g
h 2h
i 2i
j 2j
k 2k
l 2l
m 2 # EXPECTED 2m
n 2n
o 2o
p 2p
q 2q
r 2r
s 2s
t 2 # EXPECTED 2t
u 2u
v 2v
w 2w
x 2x
y 2y
z 2z