How to Count the occurances of all entities in a python spacy document?

Question

I am using SciSpasy to identify all entities in a text document. From this, I would like to return a two column data frame. With the left column being a list of unique entities found in the text document and the right side being the number of times the entity appears in the document. How can I go about this using spacy?

Would this help? https://stackoverflow.com/questions/37253326/how-to-find-the-most-common-words-using-spacy — Stef, Mar 23 '22 at 15:49
Not really. It's not counting the individual entities, just the complete set. — Alokin, Mar 23 '22 at 16:52

score 0 · Answer 1 · answered Mar 24 '22 at 03:45

0

You can just count things yourself.

import spacy
nlp = spacy.load(...) # load your model

from collections import Counter

ents = Counter()

text = ... # your text
for ent in nlp(text).ents:
    ents[f"{ent.label_}:{ent.text}"] += 1

for key, val in ents.items():
    print(val, key, sep="\t")

answered Mar 24 '22 at 03:45

polm23

14,456
7
35
59

It looks a bit counterintuitive to create an empty `Counter()` then fill it manually, rather than initialising it directly with the iterable. – Stef Mar 24 '22 at 13:15

How to Count the occurances of all entities in a python spacy document?

1 Answers1