I'm having issues applying a Regex expression to a Split()
operation found in the HuggingFace Library. The library requests the following input for Split()
.
pattern (str or Regex) – A pattern used to split the string. Usually a
string
or aRegex
In my code I am applying the Split()
operation like so:
tokenizer.pre_tokenizer = Split(pattern="[A-Z]+", behavior='isolated')
but it's not working because [A-Z]+
is being interpreted as a string not a Regex expression. I've used the following to no avail:
pattern = re.compile("[A-Z]+")
tokenizer.pre_tokenizer = Split(pattern=pattern, behavior='isolated')
Getting the following error:
TypeError: Can't convert re.compile('[A-Z]+') (re.Pattern) to Union[str, tokenizers.Regex]