0

I am attempting to extract quotations and quotation attributions (i.e., the speaker) from text, but I am getting errors. Here is the setup:

import textacy
import pandas as pd
import spacy

data = [
        ("\"Hello, nice to meet you,\" said world 1"),
        ("\"Hello, nice to meet you,\" said world 2"),  
        ]

df = pd.DataFrame(data, columns=['text'])

nlp = spacy.load('en_core_web_sm')

doc = df['text'].apply(nlp)

Here is the desired output:

[DQTriple(speaker=[world 1], cue=[said], content="Hello, nice to meet you,")] [DQTriple(speaker=[world 2], cue=[said], content="Hello, nice to meet you,")]

Here is the first attempt at extraction:

print(list(textacy.extract.triples.direct_quotations(doc) for records in doc))

Which gives the following output:

[<generator object direct_quotations at 0x7f82edf58ac0>, <generator object direct_quotations at 0x7f82edf58190>]

Here is the second attempt at extraction:

print(list(textacy.extract.triples.direct_quotations(doc)))

Which gives the following error:

AttributeError: 'Series' object has no attribute 'lang_'

jedmund
  • 55
  • 4

2 Answers2

0

In your first attempt you were extracting quotations by iterating over the tokens.

Here is an example of what you could do:

import textacy

import spacy

text =""" "Hello, nice to meet you," said world 1"""

nlp = spacy.load("en_core_web_sm")

doc = nlp(text)

print(list(textacy.extract.triples.direct_quotations(doc)))
# will print: [DQTriple(speaker=[world], cue=[said], content="Hello, nice to meet you,")]
Bill
  • 315
  • 3
  • 18
0

You would have to use

next(textacy.extract.triples.direct_quotations(doc)) 

since it is a generator object.

kakou
  • 636
  • 2
  • 7
  • 15
jeong
  • 1