3

I want to count the occurrence of the token and for each sentence in a corpus using spacy and append the result for each sentence to a list. Until now the code bellow returns the total number (for the whole corpus) regarding and.

Example/Desired output for 3 sentences : ['1', '0', '2'] Current output : [3]

doc = nlp(corpus)
nb_and = []
for sent in doc.sents:
    i = 0
    for token in sent:
        if token.text == "and":
            i += 1
            nb_and.append(i)
Vy Do
  • 46,709
  • 59
  • 215
  • 313
Artemis
  • 145
  • 7

1 Answers1

1

You need to append i to nb_and after each sentence is processed:

for sent in doc.sents:
    i = 0
    for token in sent:
        if token.text == "and":
            i += 1
    nb_and.append(i)

Test code:

import spacy
nlp = spacy.load("en_core_web_trf")
corpus = "I see a cat and a dog. None seems to be unhappy. My mother and I wanted to buy a parrot and a tortoise."
doc = nlp(corpus)
nb_and = []
for sent in doc.sents:
    i = 0
    for token in sent:
        if token.text == "and":
            i += 1
    nb_and.append(i)

nb_and
# => [1, 0, 2]
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563