spaCy SpanCategorizer performance improvement

Question

dear community,

How much has the spanCategorizer improved your models? I am curious. I have been using the textcat for categorizing text with a recall of about 85%. I wonder how much applying a spancategorizer could make a difference. I am trying to predict if a question of a questionary will bring confidential (personally identifiable) information (such as name, telephone number, address, social security number, etc.). Some questions may be very long, and then the textcat gets confused. I am expecting that being able to catch key terms should improve the prediction, but I wonder how much improvement others have brought in their models. Many thanks for your answers!

Hey, this is not really a Stack Overflow question - questions here are supposed to be about specific programming issues. You'll probably have more luck with this at the spaCy forums. https://github.com/explosion/spaCy/discussions — polm23, Oct 13 '22 at 04:18
There have been similar questions on the forums before. While it's possible annotations like NER or spancat would give you might help, they're not easy to use as input, and research on the topic suggests that as an approach it doesn't work too well. https://github.com/explosion/spaCy/discussions/10470 — polm23, Oct 13 '22 at 04:20

spaCy SpanCategorizer performance improvement

0 Answers0