-2

I have a Spacy NER model, where I am trying to extract each entity identified in a dataframe column as a separate column - so for example to create and populate the 'GPE' and 'PERSON columns:

Text GPE PERSON
random text London George London George

However, i'm not sure how to do this and match the corresponding entities / label lists. I've trialled a few ways with no success such as:

def person(v):
        if 'PERSON' in [ent.label_ for ent in ner_model(v).ents]:
            return [ent.text for ent in ner_model(v).ents]


df['Person'] = df['Text'].apply(lambda v: person(v))

df.head()

This just returns a list of entities and I have been unable to get code containing '=='Person'' to work... I wondered if anyone else has solved this issue otherwise appreciate any help!

Jon
  • 89
  • 6
  • I'm not sure exactly what you're trying to do but you seem to assume that there's one entity per label per sentence, which is absolutely not guaranteed to be the case. – polm23 Sep 27 '21 at 05:29

1 Answers1

2

I think i've got a solution by extracting all entities to a column:

def all_ents(v):
        return [(ent.text, ent.label_) for ent in ner_model(v).ents]

df['Entities'] = df['Combined'].apply(lambda v: all_ents(v))

df.head()

Then going over this column with the following, eg:

def person(v):
    r = [(x,y) for x,y in v if ('PERSON' == y)]
    return [(x) for x,y in r]

df['Person'] = df['Entities'].apply(lambda v: person(v))

df.head()

Seems to be working!

Jon
  • 89
  • 6