1

I am trying to predict entities using a custom trained NER model using spacy. I read https://github.com/explosion/spaCy/pull/8855 that confidence scores of each entity can be obtained using spancat. But I have a little confusion regarding to make that work. According to my understanding, we have to train a pipeline using spancat component. So while training, within the config file there is a segment,

[nlp]
lang = "en"
pipeline = ["tok2vec","ner"]
batch_size = 1000

Should we have to change this to

[nlp]
lang = "en"
pipeline = ["tok2vec","ner","spancat"]
batch_size = 1000

for the spancat to work.

Then after training, while predicting the entities from unknown text, should we have to use

doc = nlp(data_to_be_predicted)
spans = doc.spans["spancat"] # SpanGroup
print(spans.attrs["scores"]) # list of numbers, span length as SpanGroup

to get the confidence scores.

I am using spacy 3.1.3. I believe according to the documentation, this feature is rolled out by now.

Koen Hollander
  • 1,687
  • 6
  • 27
  • 42
imhans33
  • 133
  • 11
  • It looks like you asked this same question a few days ago with a different account? https://stackoverflow.com/questions/69671851/confidence-score-of-spacy-ner-custom-trained-and-pretrained-model – polm23 Oct 24 '21 at 03:45

1 Answers1

3

I'm not really sure there's a question in your post, but yes, the spancat is available and you can get entity scores from it.

The spancat is a different component from the ner component. So if you do this:

pipeline = ["tok2vec","ner","spancat"]

The spancat will not add scores for things your ner component predicted. You probably want to remove the ner component.


About usage, please see the docs and the example project. This is how you get the score:

doc = nlp(text)
span_group = doc.spans["spans"] # default key, can be changed
scores = span_group.attrs["scores"]

# Note that `scores` is an array with one score for each span in the group
for span, score in zip(span_group, scores):
    print(score, span)
polm23
  • 14,456
  • 7
  • 35
  • 59
  • is there any usage example available to understand how it can be obtained. I am little confused over this. WIll it be possible to get like "ADAM" ---> PERSON ---> 0.92 – imhans33 Oct 24 '21 at 05:21
  • Edited answer to include example. – polm23 Oct 24 '21 at 09:50
  • @polm23 can you please provide an A-Z tutorial to address this kind of issues? How can one produce/train a spancat model out of spacy blank model and custom labels. – SteveS Sep 19 '22 at 13:56