-1

I used jupyter notebook here.

This code is from a youtube video. It was working in the youtuber's computer but mine raise a Stopiteration error

Here I am trying to get all the titles(questions from the csv) that are questions related to 'Go' language

import pandas as pd

df = pd.read_csv("Questions.csv", encoding = "ISO-8859-1", usecols = ["Title", "Id"])

titles = [_ for _ in df.loc[lambda d: d['Title'].str.lower().str.contains(" go "," golang ")]['Title']]

#new cell

import spacy

nlp = spacy.load("en_core_web_sm" , disable= ["ner"])

#new cell

def has_golang(text):
    doc = nlp(text)
    for t in doc:    
        if t.lower_ in [' go ', 'golang']:
            if t.pos_ != 'VERB':
                if t.dep_ == 'pobj':
                    return True
    return False

g = (title for title in titles if has_golang(title))
[next(g) for i in range(10)]

#This is the error

StopIteration                             Traceback (most recent call last)
<ipython-input-56-862339d10dde> in <module>
      9 
     10 g = (title for title in titles if has_golang(title))
---> 11 [next(g) for i in range(10)]

<ipython-input-56-862339d10dde> in <listcomp>(.0)
      9 
     10 g = (title for title in titles if has_golang(title))
---> 11 [next(g) for i in range(10)]

StopIteration: 

As far as I have done the research I think it might be a bug

All I want to do is get those titles that satisfy the 3 'if' conditions

link to the youtube video

Jonas
  • 121,568
  • 97
  • 310
  • 388
vedant
  • 1
  • 2
  • 3
    If you're trying to get the elements of g from its generator just use `elements = list(g)`. Using `[next(g) for i in range(10)]` will have a stopiteration error unless there are 10 or more items in the generator. – DarrylG Mar 24 '21 at 20:53
  • it is returning empty square bracket ' [ ] ' – vedant Mar 24 '21 at 21:14
  • can you be more specific where should I exactly put that piece of code. I replaced the [next(g) for i in range(10)] with your suggestion – vedant Mar 24 '21 at 21:16
  • 1
    @vedant--checking the video you changed the expression for titles. The video has `titles = [_ for _ in df.loc[lambda d: d['Title'].str.lower().str.contains("go")]['Title']]` – DarrylG Mar 24 '21 at 22:01
  • yep it worked. thank you. I thought tweaking in that part of line would make it faster – vedant Mar 25 '21 at 17:05
  • @vedant--issue is signature of function contains is [str.contains(pat, case=True, flags=0, na=None, regex=True)](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.contains.html). You're passing two arguments i.e. `" go "," golang "`. The first will correspond to pat and the second to case (i.e. will not be used for golang, so this has no effect). As mentioned in the video placing spaces around the word causes problems since you won't recognize if the word appears at the beginning or end of a sentence. You could try pat = r"\bgo|golang\b" as a regex. – DarrylG Mar 25 '21 at 17:14

1 Answers1

2

The StopIteration is the result of calling next() on an exhausted iterator, i.e. g produces less than 10 results. You can get this information from the help() function.

help(next)
Help on built-in function next in module builtins:
next(...)
    next(iterator[, default])
    
    Return the next item from the iterator. If default is given and the iterator
    is exhausted, it is returned instead of raising StopIteration.

Edit

Your has_golang is incorrect. The first test is always False because nlp tokenizes words, i.e. trims the leading and trailing spaces. Try this:

def has_golang(text):
    doc = nlp(text)
    for t in doc:    
        if t.lower_ in ['go', 'golang']:
            if t.pos_ != 'VERB':
                if t.dep_ == 'pobj':
                    return True
    return False

I figured this out by finding a title which should result in True from has_golang. I then ran the following code:

doc = nlp("Making a Simple FileServer with Go and Localhost Refused to Connect")
print("\n".join(str((t.lower_, t.pos_, t.dep_)) for t in doc))
('making', 'VERB', 'csubj')
('a', 'DET', 'det')
('simple', 'PROPN', 'compound')
('fileserver', 'PROPN', 'dobj')
('with', 'ADP', 'prep')
('go', 'PROPN', 'pobj')
('and', 'CCONJ', 'cc')
('localhost', 'PROPN', 'conj')
('refused', 'VERB', 'ROOT')
('to', 'PART', 'aux')
('connect', 'VERB', 'xcomp')

Then looking at ('go', 'PROPN', 'pobj'), it's obvious that PROPN is not VERB, and pobj is pobj, so the issue has to be with the token: go, specifically "go" not " go ".


Original Response

If you just want the titles that satisfy the 3 if conditions, skip the generator:

g = list(filter(has_golang, titles))

If you need the generator but also want a list:

g = (title for title in titles if has_golang(title))
list(g)
Michael Ruth
  • 2,938
  • 1
  • 20
  • 27
  • I tried list(g) but it gave me ' [ ] ' an empty bracket – vedant Mar 24 '21 at 21:05
  • I need it to show all the titles satisfying those if conditions – vedant Mar 24 '21 at 21:06
  • That's because nothing in titles satisfies the `if` conditions. Please provide `Questions.csv`. It is difficult to diagnose why `g` is empty without the source data. – Michael Ruth Mar 24 '21 at 21:16
  • https://www.kaggle.com/stackoverflow/stacksample?select=Questions.csv – vedant Mar 24 '21 at 21:21
  • you might understand the problem better with the video https://youtu.be/WnGPv6HnBok look from 18:02 in the video – vedant Mar 24 '21 at 21:26
  • OK, so something is wrong with `has_golang()`. What have you tried in order to see what's going on in `has_golang()`? – Michael Ruth Mar 24 '21 at 21:59
  • 1
    As I commented above, the OP expression for titles is incorrect (i.e. doesn't follow the video). Should be: `titles = [_ for _ in df.loc[lambda d: d['Title'].str.lower().str.contains("go")]['Title']]`. – DarrylG Mar 24 '21 at 23:46
  • @DarrylG, yup, and by padding the `str.contains` arguments with spaces the OP excludes titles which end with `go` or `golang`. – Michael Ruth Mar 25 '21 at 02:26
  • @MichaelRuth--another issue is to match multiple keywords OP needed `‘|’.join(include Keywords)`. Having `...str.contains(" go "," golang ")` only picks up the first keyword i.e. " go " (i.e. " golang " ignored). – DarrylG Mar 25 '21 at 02:38
  • Yeah, I guess I don't understand what the OP wants. If the OP wants the same result as in the video then just copy the code from the video, it's pretty simple. – Michael Ruth Mar 25 '21 at 03:15
  • yep it worked. thank you. I thought tweaking in that part of line would make it faster but that messed it up. And thanks for telling me about the str.contains() part, I will remember that – vedant Mar 25 '21 at 17:11
  • @vedant, channel Dr. Knuth: https://en.wikipedia.org/wiki/Program_optimization#When_to_optimize – Michael Ruth Mar 25 '21 at 19:17