I am attempting to extract quotations and quotation attributions (i.e., the speaker) from text, but I am getting errors. Here is the setup:
import textacy
import pandas as pd
import spacy
data = [
("\"Hello, nice to meet you,\" said world 1"),
("\"Hello, nice to meet you,\" said world 2"),
]
df = pd.DataFrame(data, columns=['text'])
nlp = spacy.load('en_core_web_sm')
doc = df['text'].apply(nlp)
Here is the desired output:
[DQTriple(speaker=[world 1], cue=[said], content="Hello, nice to meet you,")] [DQTriple(speaker=[world 2], cue=[said], content="Hello, nice to meet you,")]
Here is the first attempt at extraction:
print(list(textacy.extract.triples.direct_quotations(doc) for records in doc))
Which gives the following output:
[<generator object direct_quotations at 0x7f82edf58ac0>, <generator object direct_quotations at 0x7f82edf58190>]
Here is the second attempt at extraction:
print(list(textacy.extract.triples.direct_quotations(doc)))
Which gives the following error:
AttributeError: 'Series' object has no attribute 'lang_'