Extract quotations using textacy

Question

I am attempting to extract quotations and quotation attributions (i.e., the speaker) from text, but I am getting errors. Here is the setup:

import textacy
import pandas as pd
import spacy

data = [
        ("\"Hello, nice to meet you,\" said world 1"),
        ("\"Hello, nice to meet you,\" said world 2"),  
        ]

df = pd.DataFrame(data, columns=['text'])

nlp = spacy.load('en_core_web_sm')

doc = df['text'].apply(nlp)

Here is the desired output:

[DQTriple(speaker=[world 1], cue=[said], content="Hello, nice to meet you,")] [DQTriple(speaker=[world 2], cue=[said], content="Hello, nice to meet you,")]

Here is the first attempt at extraction:

print(list(textacy.extract.triples.direct_quotations(doc) for records in doc))

Which gives the following output:

[<generator object direct_quotations at 0x7f82edf58ac0>, <generator object direct_quotations at 0x7f82edf58190>]

Here is the second attempt at extraction:

print(list(textacy.extract.triples.direct_quotations(doc)))

Which gives the following error:

AttributeError: 'Series' object has no attribute 'lang_'

score 0 · Answer 1 · answered Jun 17 '22 at 09:32

In your first attempt you were extracting quotations by iterating over the tokens.

Here is an example of what you could do:

import textacy

import spacy

text =""" "Hello, nice to meet you," said world 1"""

nlp = spacy.load("en_core_web_sm")

doc = nlp(text)

print(list(textacy.extract.triples.direct_quotations(doc)))
# will print: [DQTriple(speaker=[world], cue=[said], content="Hello, nice to meet you,")]

score 0 · Answer 2 · edited May 14 '23 at 11:22

0

You would have to use

next(textacy.extract.triples.direct_quotations(doc))

since it is a generator object.

edited May 14 '23 at 11:22

kakou

636
2
7
15

answered May 09 '23 at 01:16

jeong

1

Extract quotations using textacy

2 Answers2