How to speed up Spacy's nlp call?

Question

I have to process hundreds of thousands of texts. I have found that the thing that is taking the longest in the following:

nlp = English()
ruler = EntityRuler(nlp)
patterns = [...]
ruler.add_patterns(patterns)
nlp.add_pipe(ruler)
...
#This line takes longer than I would like
doc = nlp(whole_chat)

Granted, I have many patterns. But is there a way to speed this up? I only have the entity ruler pipe, no others.

For anyone coming here, there's now an official speed FAQ for spaCy with the advice from the answers here and more. https://github.com/explosion/spaCy/discussions/8402 — polm23, Jan 07 '22 at 05:11

score 7 · Answer 1 · answered May 30 '20 at 08:33

By default, Spacy applies lots of models to your document: POS tagger, a syntactic parser, NER, a document categorizer, and maybe something else.

Maybe you do not need some of these models. If it is the case, you can disable them, which will speed up your pipeline. You do it when creating the pipeline, like this:

nlp = spacy.load('en_core_web_sm', disable=['ner', 'parser'])

Or, following the @oleg-ivanytskiy's answer, you can disable these models in the nlp.pipe() call:

nlp = spacy.load("en_core_web_sm")
for doc in nlp.pipe(texts, disable=["tagger", "parser"]):
    # Do something with the doc here
    print([(ent.text, ent.label_) for ent in doc.ents])

Thanks. The entity ruler is the only pipe I had. I'll try out nlp.pipe. — formicaman, May 30 '20 at 11:54

Oleg Ivanytskyi · Answer 2 · 2020-05-29T20:21:06.400

2

Use nlp.pipe() to process multiple texts. It is faster and more efficient (documentation)

edited May 29 '20 at 20:21

answered May 29 '20 at 20:15

Oleg Ivanytskyi

959
2
12
28

How to speed up Spacy's nlp call?

2 Answers2

Linked