0

I'm applying the 'list' function to a pandas col which contains generator objects, in attempt to show all generator objects in col. When applying, the col returns empty lists. The 'subject_verb_object_triples' is a textacy function (https://chartbeat-labs.github.io/textacy/_modules/textacy/extract.html)

print(sp500news3)

date_publish    title
79944   2007-01-29 19:08:35 <generator object subject_verb_object_triples at 0x1a42713550>
181781  2007-12-14 19:39:06 <generator object subject_verb_object_triples at 0x1a42713410>
213175  2008-01-22 11:17:19 <generator object subject_verb_object_triples at 0x1a427135f0>
93554   2008-01-22 18:52:56 <generator object subject_verb_object_triples at 0x1a427135a0>

In []: sp500news3["title"].apply(list)
Out []: 79944     []
        181781    []
        213175    [] ...

The expected output are tuples such as the following:

[(Sky proposal, is, matter), (Sky proposal, is, Mays spokesman)], 
[(Women, lag, Intel report)], 
[(Amazon, expected, to unveil)], 
[(Goldman Sachs, raising, billion)], 
[(MHP, opens, books)], 
[(Disney, hurls, magic), (Disney, hurls, moolah)], 
[(Amazon, offering, loans), (Amazon, offering, to)], ....

How can I display the expected output in my dataframe?

W.R
  • 187
  • 1
  • 1
  • 14
  • What is the expected output ? Is there a question ? How can we help ? – Benoît P Feb 04 '19 at 14:04
  • @BenoîtPilatte - have updated q – W.R Feb 04 '19 at 14:07
  • Could you use a `lambda` here? `lambda x: [a for a in x]` – C.Nivs Feb 04 '19 at 14:07
  • you probably want `sp500news3["title"].apply(lambda x: list(x)` – Josh Friedlander Feb 04 '19 at 14:28
  • @JoshFriedlander this still returns empty lists – W.R Feb 04 '19 at 15:35
  • as does the suggestion from @C.Nivs – W.R Feb 04 '19 at 15:39
  • Were the generator already consumed? @C.Nivs's solution `sp500news3["title"].apply(lambda x: [a for a in x])` should give you the output expected. Run fresh data against this line and report your findings. – r.ook Feb 04 '19 at 15:44
  • I believe using `textacy.extract.subject_verb_object_triples` in the form `sp500news3['title'].apply(textacy.extract.subject_verb_object_triples)` yields an empty result for some reason @Idlehands – W.R Feb 04 '19 at 16:07
  • Then it's probably best you create a [MCVE]. – r.ook Feb 04 '19 at 17:12
  • Also, you're not supposed to `.apply(textacy.extract.subject_verb_object_triples)`. Use `.apply(lambda x: [a for a in x])` as advised! Your example is only going to return you a generator function, not the actual results. – r.ook Feb 04 '19 at 17:15
  • generators are a one-time-use only object. Once you exhaust the generator, it's gone. So you'll have to re-run how you built that dataframe – C.Nivs Feb 04 '19 at 17:46

1 Answers1

0

I have tested below code and it is working fine

import textacy
import pandas as pd
from textacy import preprocessing
pd.options.display.max_colwidth=-1
df['<New Column name'>]=df['<Your column name that needs to be processed>'].apply(lambda x:preprocessing.normalize_whitespace(preprocessing.remove_punctuation(str(x))))
Naveen Srikanth
  • 739
  • 3
  • 11
  • 23