Spacy: count occurrence for specific token in each sentence

Question

I want to count the occurrence of the token and for each sentence in a corpus using spacy and append the result for each sentence to a list. Until now the code bellow returns the total number (for the whole corpus) regarding and.

Example/Desired output for 3 sentences : ['1', '0', '2'] Current output : [3]

doc = nlp(corpus)
nb_and = []
for sent in doc.sents:
    i = 0
    for token in sent:
        if token.text == "and":
            i += 1
            nb_and.append(i)

score 1 · Accepted Answer · answered Nov 03 '21 at 20:45

You need to append i to nb_and after each sentence is processed:

for sent in doc.sents:
    i = 0
    for token in sent:
        if token.text == "and":
            i += 1
    nb_and.append(i)

Test code:

import spacy
nlp = spacy.load("en_core_web_trf")
corpus = "I see a cat and a dog. None seems to be unhappy. My mother and I wanted to buy a parrot and a tortoise."
doc = nlp(corpus)
nb_and = []
for sent in doc.sents:
    i = 0
    for token in sent:
        if token.text == "and":
            i += 1
    nb_and.append(i)

nb_and
# => [1, 0, 2]

Spacy: count occurrence for specific token in each sentence

1 Answers1