2

I have the following dataframe:

df = pd.DataFrame({'source': ['Paul', 'Paul'],
                   'target': ['GOOGLE', 'Ferrari'],
                   'edge': ['works at', 'drive']
                   })

df
    source  target  edge
0   Paul    GOOGLE  works at
1   Paul    Ferrari drive

I want to apply Name-Entity Recognition(NER) on the columns.

Expected outcome:

    source  target        edge
0   PERSON  ORGANIZATION  works at
1   PERSON  CAR           drive

I tried the following function:

!python -m spacy download en_core_web_sm

import spacy
nlp = spacy.load('en_core_web_sm')

def ner(df):
    df['source_entities'] = df['source'].apply(lambda x: nlp(x).label_)
    df['target_entities'] = df['target'].apply(lambda x: nlp(x).label_)
    return df

But when I call the function ner(df) I get back an error:

AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'label_'

Any ideas on how to reach the expected outcome?

xavi
  • 80
  • 1
  • 12
  • What is `nlp`? Please [edit] to include your imports and definitions to make a [mcve] so that we can better understand your question – G. Anderson Jul 25 '22 at 16:21

2 Answers2

4

The simplest way to do so is using apply method.

nlp = spacy.load('en_core_web_sm')
df = pd.DataFrame({'Text':['The University of Cambridge is a collegiate research university in Cambridge, United Kingdom.', "Cambridge is ranked among the most prestigious universities in the world and currently sits as the world's second best university, and the best in Europe, according to the QS World University Rankings."]})
df['Entities'] = df['Text'].apply(lambda sent: [(ent.text, ent.start_char, ent.end_char, ent.label_) for ent in nlp(sent).ents])  
df['Entities'][1]

output:

[('Cambridge', 0, 9, 'GPE'),
 ('second', 107, 113, 'ORDINAL'),
 ('Europe', 147, 153, 'LOC'),
 ('the QS World University Rankings', 168, 200, 'ORG')]
meti
  • 1,921
  • 1
  • 8
  • 15
  • The answer does not match the expected values. It just returns one more column having information as depicted on the output – xavi Jul 26 '22 at 10:32
  • Yup! I added a new column to the dataframe named `Entities` which contains NEs for the sentence. You can simply get the list of it for each row using `df['Entities'].to_list()` – meti Jul 26 '22 at 11:54
2

You are trying to get label_ attribute from list as nlp(x) return list of object. Because of which you are getting that error.

Replace

def ner(df):
  df['source_entities'] = df['source'].apply(lambda x: nlp(x).label_)
  df['target_entities'] = df['target'].apply(lambda x: nlp(x).label_)
  return df

With

def ner(df):
  df['source_entities'] = df['source'].apply(lambda x: [ent.label_ for ent in nlp(x).ents])
  df['target_entities'] = df['target'].apply(lambda x: [ent.label_ for ent in nlp(x).ents])
  return df
Nandan Rana
  • 539
  • 3
  • 12