0

I am using SciSpasy to identify all entities in a text document. From this, I would like to return a two column data frame. With the left column being a list of unique entities found in the text document and the right side being the number of times the entity appears in the document. How can I go about this using spacy?

Alokin
  • 461
  • 1
  • 4
  • 22

1 Answers1

0

You can just count things yourself.

import spacy
nlp = spacy.load(...) # load your model

from collections import Counter

ents = Counter()

text = ... # your text
for ent in nlp(text).ents:
    ents[f"{ent.label_}:{ent.text}"] += 1

for key, val in ents.items():
    print(val, key, sep="\t")
polm23
  • 14,456
  • 7
  • 35
  • 59
  • It looks a bit counterintuitive to create an empty `Counter()` then fill it manually, rather than initialising it directly with the iterable. – Stef Mar 24 '22 at 13:15