3

I just read a bunch of posts on how to handle the StopIteration error in Python, I had trouble solving my particular example, though. Basically, I have a csv file with a lot of prefixes. This file has two columns with headers: Word and Count. Count is the frequency with which that prefix occurs. I also have another file with a list of company names. The prefix file acquired the prefixes from the first word of each company name in the company file. I'm trying to remove duplicates, and what I want to do right now is :

Ignore the StopIteration error every time this error would occur.

In order words, instead of having to write all the commented out "if" statements below, I just want one line that says: if a StopIteration error is generated, simply ignore the error is some way by treating the problematic "prefix" as if it were a prefix which occurs more than twice in the prefix file, such that we should return the value of the company name without the prefix included. I realize that this ignores the fact that there is a different prefix value in the prefix file and the actual prefix of the company name, but usually it has to do with non-American English letters stored differently between python and excel, and a few other ways that don't seem particularly systematic so I'll just remove them manually later.

My code is:

def remove_prefix(prefix, first_name):
   #try:
   #EXCEPTIONS:
   #if '(' in prefix:
   #    prefix = prefix[1:]
      #if ')' in prefix:
    #  prefix = prefix[:-1]
    """
      if prefix == "2-10":
        prefix = "2"
      if prefix == "4:2:2":
        prefix = "4"
      if prefix == "5/0" or prefix == "5/7" or prefix == "58921-":
        prefix = "5"
    """
    #except StopIteration:
   #    pass

   print(first_name, prefix)
   input_fields = ('Word', 'Count')
   reader = csv.DictReader(infile1, fieldnames = input_fields)
   #if the prefix has a frequency of x >=2 in the prefix file, then return first_name  without prefix
   #else, return first_Name
   infile1.seek(0)
   #print(infile1.seek(0))
   next(reader)
   first_row = next(reader)
   while prefix != first_row['Word'] and prefix[1:]!= first_row['Word']:
      first_row = next(reader)
      #print(first_name, prefix)
      #print(first_row, first_name, prefix, '\t' + first_row['Word'], prefix[1:])
   if first_row['Count'] >= 2:
      length = len(prefix)
      first_name = first_name[length+1:]
  #print("first name is ", first_name)
  return first_name
twasbrillig
  • 17,084
  • 9
  • 43
  • 67
user1590499
  • 933
  • 3
  • 10
  • 17
  • 2
    Which line causes the exception (in the traceback)? – Andy Hayden Aug 31 '12 at 19:32
  • Thanks for looking into this. The line is the "while" statement, because "prefix" isn't in first_row['Word'] since it is just slightly off. – user1590499 Aug 31 '12 at 19:34
  • 1
    It looks like what those `if` statements are trying to do (for a few hardcoded special cases) is get the digits at the start of a string (stopping before any other characters, like `/`, `-`, or `:`). That could be done very easily with a regular expression. Would that solve the problem? – David Robinson Aug 31 '12 at 19:34
  • Not exactly, because I just made those if statements assign the prefix variable to a value that I knew would pass the check. What I'm looking for is a way that if the prefix variable has a value that is not in first_row['Word'], then the prefix value gets assigned a value that would pass the check. – user1590499 Aug 31 '12 at 19:38
  • @user1590499: Have you tried my suggested solution? – David Robinson Aug 31 '12 at 20:35

2 Answers2

3

I don't think this is caused by what you think it is caused by. The StopIteration exception is caused when the generator (reader) runs out of lines to read.

For example:

def g():
    "generates 1 (once)"
    yield 1

a = g()
next(a) # is 1
next(a) # StopIteration exception (nothing left to yield)

To fix this you can wrap the next in a try, except (pass):

while prefix != first_row['Word'] and prefix[1:]!= first_row['Word']:
    try:
        first_row = next(reader)
    except StopIteration:
        pass

However, as David points out, this is probably not the way you ought to be going about this.

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • Thanks man. Yeah I tried the try/except(pass) wrapper, but you guys are right that it's not addressing the heart of the matter. I'm still working on getting David's suggestion to work for me, and I'll post here below once I get it. Thanks for the help! – user1590499 Sep 01 '12 at 02:34
  • Not exactly what I needed but close enough that I was able to draw from it and solve my problem. I'm using a generator to unscramble permutations. Each permutation is the same length. – Michael Swartz Feb 23 '18 at 08:12
2

This could be done in a much easier way by creating a list of prefixes from the file first, and then using the startswith method on each. For example:

reader = csv.DictReader(infile1)
# # this is assuming there are only two columns in the file: Word and Count
prefixes = [l["Word"] for l in list(reader) if int(l["Count"]) >= 2]

def remove_prefix(first_name):
    for p in prefixes:
        if first_name.startswith(p):
            return first_name[len(p):]
    return first_name

Wouldn't that be simpler? Another advantage is that it reads the file only once, instead of reopening it for every word it wants to replace.

David Robinson
  • 77,383
  • 16
  • 167
  • 187
  • Thanks so much. Sorry for the delay. I discovered a bunch of other problems when I tried to implement this (I created the prefix file itself in a way with a few errors), so I got bogged down in them for a few hours and wanted to post here when I had solved it. I'm not going to be at the computer this weekend but I'll be sure to post here as soon as I get the answer on Tuesday. Thanks again, I really appreciate your help, and intuitively your answer makes a lot of sense. I'm not clear on how the line where you define "prefixes" works exactly, but the general idea makes sense to me. – user1590499 Sep 01 '12 at 02:31
  • Hey David. Sorry for the belated reply. I've tried this out and it doesn't seem to work. More specifically, the statement where you assign "prefixes" to a for loop doesn't work. What happens is when I print "p" for every p in prefixes, it prints "Count" every single time. Furthermore, I have to take out the int(c) casting of c or I get an error, so I need to keep c as a string. Not sure why this is, though. Much thanks! – user1590499 Sep 05 '12 at 14:23
  • Thanks! The logic of this statement is pretty clear and it works like a charm :) – user1590499 Sep 05 '12 at 14:33