Why does a pretrained spacy pipe not work when added to a spacy.blank pipe?

Question

I am trying to add spacy's already trained parser for Norwegian Bokmål to a blank spacy pipe. I get no error message when I add the pipe, but whatever the input, the pipe categorizes all tokens as nouns. What am I missing here?

import spacy
from spacy import displacy

nlp = spacy.blank("nb")
wanted_pipes = ["morphologizer", "parser"] 

for pipe_name in wanted_pipes:
  if pipe_name not in nlp.pipe_names:
    nlp.add_pipe(pipe_name, source = spacy.load("nb_core_news_sm"))
nlp.initialize()
doc = nlp("Katten heter Petrus.") # a random Norwegian sentence

score 1 · Answer 1 · answered Sep 27 '22 at 06:33

There are a couple of problems with the way you're loading the pipeline here. One is that you need a tok2vec for the morphologizer and parser to get meaningful input, but another is that calling initialize wipes their weights.

A better way to load the pipeline is to use disable to just exclude things you don't want, like this:

nlp = spacy.load("nb_core_news_sm", disable=["lemmatizer", "ner"])

I would recommend leaving the attribute_ruler in because it's fast and often works with the morphologizer.

Also, it should be easier to use enable rather than disable to list what you want to keep, but there's currently a bit of an issue with that. We're working on a fix, see here for details.

Why does a pretrained spacy pipe not work when added to a spacy.blank pipe?

1 Answers1