I am using SciSpasy to identify all entities in a text document. From this, I would like to return a two column data frame. With the left column being a list of unique entities found in the text document and the right side being the number of times the entity appears in the document. How can I go about this using spacy?
Asked
Active
Viewed 1,179 times
0
-
Would this help? https://stackoverflow.com/questions/37253326/how-to-find-the-most-common-words-using-spacy – Stef Mar 23 '22 at 15:49
-
Also: https://github.com/explosion/spaCy/issues/139 – Stef Mar 23 '22 at 15:50
-
Not really. It's not counting the individual entities, just the complete set. – Alokin Mar 23 '22 at 16:52
-
Give us a sample of data and what is your expectation? – Talha Tayyab Mar 24 '22 at 02:56
1 Answers
0
You can just count things yourself.
import spacy
nlp = spacy.load(...) # load your model
from collections import Counter
ents = Counter()
text = ... # your text
for ent in nlp(text).ents:
ents[f"{ent.label_}:{ent.text}"] += 1
for key, val in ents.items():
print(val, key, sep="\t")

polm23
- 14,456
- 7
- 35
- 59
-
It looks a bit counterintuitive to create an empty `Counter()` then fill it manually, rather than initialising it directly with the iterable. – Stef Mar 24 '22 at 13:15