I'm writing a regex to match the following condition:
shall not specify a character whose short identifier is less than 00A0 other than 0024 ( $ ), 0040 ( @ ), or 0060 (‘), nor one in the range D800 through DFFF inclusive.
I wrote the following regex:
PATTERN = ([\u0024\u0040\u0060]|(?![\u0000-\u00A0])|(?![\u8000-\udfff]))
and use it for search as follows
str = #some str
search = re.search(PATTERN, str, re.UNICODE)
The thing that I'm confused by is that \u8000 - \udfff
are surrogate
DEMO.
But running such regex in my script seems to work fine. Is it a correct way to use regex to filter out such characters?