3

For a Named Entity Recognition task in Dutch with spaCy, I added entities using EntityRuler. When I add the ruler to the pipeline in my notebook:

nlp = spacy.load("nl_core_news_md")
ruler = nlp.add_pipe("entity_ruler", before="ner")
patterns  = complete_dicts # This is a list of dictionaries, e.g. [{"label": "PERSON", "pattern": "Staf Aerts"}, {"label": "PERSON", "pattern": "Meyrem Almaci"}]
ruler.add_patterns(patterns)

the NER-pipeline works very well. However, when I save it to my disk and then load this model again using

nlp.from_disk("path/to_model")

the model misses entities that are added through the EntityRuler.

I found nothing in the documentation why this would happen. I would be grateful for anyone who has an explanation for this! Thanks.

polm23
  • 14,456
  • 7
  • 35
  • 59
IneG
  • 73
  • 7
  • That should just work. How are you actually saving and loading the model? You should be using `spacy.from_disk`, for example. In the saved model directory there should be a `entity_ruler/patterns.jsonl` file that you can confirm has your patterns. – polm23 Dec 14 '22 at 04:32
  • Hi! To save the model, I did `nlp.to_disk("path/to_model")`. For loading, `nlp.from_disk("path/to_model")`. The saving indeed creates a folder with all the steps, and within the `entity_ruler` there is the `patterns.jsonl` file. These are the right patterns, which is why I am confused this does not work! – IneG Dec 14 '22 at 07:09
  • Ah, I meant `to_disk` in my comment. Sorry for not catching the `from_disk` vs `load` bit. – polm23 Dec 14 '22 at 09:28

1 Answers1

3

To load a saved model, use spacy.load:

nlp = spacy.load("/path/to/model")

More details about how spacy.load works (including nlp.from_disk): https://spacy.io/usage/processing-pipelines#pipelines

aab
  • 10,858
  • 22
  • 38