How to select only first entity extracted from spacy entities?

Question

I am trying to using following code to extract entities from text available in DataFrame.

for i in df['Text'].to_list():

    doc = nlp(i)
    for entity in doc.ents:
        if entity.label_ == 'GPE':

I need to store text of first GPE with it's corresponding column of text. Like for instance if following is text at index 0 in column df['Text']

Match between USA and Canada was postponed

then I need only first location(USA) in another column such as df['Place'] at the corresponding index to Text which is 0. df['Place'] is not already available in DataFrame means it will be created while assigning value. I have tried following code. But it fills whole column with very first value it can find.

for i in df['Text'].to_list():

    doc = nlp(i)
    for entity in doc.ents:
        if entity.label_ == 'GPE':
            df['Place'] = (entity.text)

I have also tried to append text in list with e_list.append((entity.text)) but it will append all entities it can find in text. Can someone help that how can I store only first entity only at corresponding index. Thank you

Maybe all you need is `df['Place'] = df['Text'].apply(lambda x: [entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'])`? — Wiktor Stribiżew, Dec 22 '20 at 10:19
Or, `df['Text'].apply(lambda x: ([entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'] or [''])[0])`? — Wiktor Stribiżew, Dec 22 '20 at 10:29
Both are working first one is giving all entities and second one is giving only first. Can you please add it to answer so I can mark it. Thank you @Wiktor Stribizew — weezx, Dec 22 '20 at 10:40

score 1 · Accepted Answer · answered Dec 22 '20 at 10:43

You can get all the entities per each entry using Series.apply on the Text column like

df['Place'] = df['Text'].apply(lambda x: [entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'])

If you are only interested in getting the first entity only from each entry use

df['Text'].apply(lambda x: ([entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'] or [''])[0])

Here is a test snippet:

import spacy
import pandas as pd
df = pd.DataFrame({'Text':['Match between USA and Canada was postponed', 'No ents']})
df['Text'].apply(lambda x: [entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'])
# => 0    [USA, Canada]
#    1               []
#    Name: Text, dtype: object
df['Text'].apply(lambda x: ([entity.text for entity in nlp(x).ents if entity.label_ == 'GPE'] or [''])[0])
# => 0    USA
#    1       
#    Name: Text, dtype: object

How to select only first entity extracted from spacy entities?

1 Answers1

Linked