How do you add custom punctuation (e.g. asterisk) to the infix list in a Tokenizer and have that recognized by nlp.explain
as punctuation? I would like to be able to add characters that are not currently recognized as punctuation to the punctuation list from the list of set infixes so that the Matcher can use them when matching {'IS_PUNCT': True}
.
An answer to a similar issue was provided here How can I add custom signs to spaCy's punctuation functionality?
The only problem is I am unable to package the newly recognized punctuation with the model. A side note: the tokenizer already recognizes infixes with the desired punctuation, so all that is left is propagating this to the Matcher.