3

Hey I am a newbie at python and I need some help. I've written down the following code:

 try:
  it = iter(cmLines)
  line=it.next()
  while (line):
    if ("INFERNAL1/a" in line) or ("HMMER3/f" in line) :
      title = line
      line = it.next()
      if word2(line) in namesList: //if second word in line is in list
        output.write(title)
        output.write(line)
        line = it.next()
        while ("//" not in line):
          output.write(line)
          line = it.next()
        output.write(line)
    line = it.next()
except Exception as e:
  print "Loop exited becuase:"
  print type(e)
  print "at " + line
finally:
  output.close()
  1. When the loop ends it always throws an Exception that notifies that the loop stopped. Even though it didn't terminate prematurely. How do I stop that?

  2. Is there a better way to write my code? Something more stylish. I have a big file that has lots of information and I am trying to catch only the information I need. Every slice of information is of the format:

    Infernal1/a ...
    Name someSpecificName
    ...
    ...
    ...
    ...
    // 
    

Thank you

mechanical_meat
  • 163,903
  • 24
  • 228
  • 223
user2002121
  • 75
  • 3
  • 8

5 Answers5

2

RocketDonkey's answer is spot-on. Because of the complexity of the way you're iterating, there is no simple way to do this with a for loop, so you're going to need to explicitly handle StopIteration.

However, if you rethink the problem a bit, there are other ways around this. For example, a trivial state machine:

try:
    state = 0
    for line in cmLines:
        if state == 0:
            if "INFERNAL1/a" in line or "HMMER3/f" in line:
                title = line
                state = 1
        elif state == 1:
            if word2(line) in NamesList:
                output.write(title)
                output.write(line)
                state = 2
            else:
                state = 0
        elif state == 2:
            output.write(line)
            if '//' in line:
                state = 0
except Exception as e:
    print "Loop exited becuase:"
    print type(e)
    print "at " + line
finally:
    output.close()

Alternatively, you can write a generator function that delegates to sub-generators (via yield from foo() if you're in 3.3, via for x in foo(): yield x if not), or various other possibilities, especially if you rethink your problem at a higher level.

That may not be what you want to do here, but it's usually worth at least thinking about "Can I turn this while loop and two explicit next calls into a for loop?", even if the answer turns out to be "No, not without making things less readable."

As a side note, you can probably simplify things by replacing the try/finally with a with statement. Instead of this:

output = open('foo', 'w')
try:
    blah blah
finally:
    output.close()

You can just do this:

with open('foo', 'w') as output:
    blah blah

Or, if output isn't a normal file, you can still replace the last four lines with:

with contextlib.closing(output):
    blah blah
abarnert
  • 354,177
  • 51
  • 601
  • 671
  • @RocketDonkey: I was a bit worried that this was a bad example, because the state machine is complex enough to be worth having but simple enough to hand-unroll, and that's rarely true in real life. (Plus, I didn't think of good names for the states, so I just called them 0, 1, 2, which isn't exactly good advice…) So, I almost scrapped it and rewrote everything as coroutines, before realizing that the "If you're on 3.0-3.2, do this instead; if you're on 2.6-2.7, do that, …" would be 3x as long as the code… – abarnert Jan 23 '13 at 02:16
1

When you call line = it.next(), when there is nothing left a StopIteration exeception is raised:

>>> l = [1, 2, 3]
>>> i = iter(l)
>>> i.next()
1
>>> i.next()
2
>>> i.next()
3
>>> i.next()
Traceback (most recent call last):
  File "<ipython-input-6-e590fe0d22f8>", line 1, in <module>
    i.next()
StopIteration

This will happen in your code every time because you are calling it at the end of your block, so the exception gets raised before the loop has a chance to circle back around and find that line is empty. As a band-aid fix, you could do something like this, where you catch the StopIteration exception and pass out of it (since that indicates it is done):

# Your code...
except StopIteration:
    pass
except Exception as e:
  print "Loop exited becuase:"
  print type(e)
  print "at " + line
finally:
  output.close()
RocketDonkey
  • 36,383
  • 7
  • 80
  • 84
  • +1. Getting this right can be tricky, which is why, as a general rule, a `while` loop around `it.next()` should usually be rewritten as a `for line in it:` loop. But when you're trying to advance the iterator extra times within the loop, this general advice doesn't work, so you need something like this, or a larger rewrite. – abarnert Jan 23 '13 at 00:53
  • @abarnert Ha, I had dropped in a `for` loop as a suggestion, and then I actually took a look at what s/he was trying to do and realized that it wouldn't quite achieve what they wanted :) – RocketDonkey Jan 23 '13 at 01:17
0

I like Parser Combinators, since they lead to a much more declarative style of programming.

For example with the Parcon library:

from string import letters, digits
from parcon import (Word, Except, Exact, OneOrMore,
                    CharNotIn, Literal, End, concat)

alphanum = letters + digits

UntilNewline = Exact(OneOrMore(CharNotIn('\n')) + '\n')[concat]
Heading1 = Word(alphanum + '/')
Heading2 = Word(alphanum + '.')
Name = 'Name' + UntilNewline
Line = Except(UntilNewline, Literal('//'))
Lines = OneOrMore(Line)
Block = Heading1['hleft'] + Heading2['hright'] + Name['name'] + Lines['lines'] + '//'
Blocks = OneOrMore(Block[dict]) + End()

And then, using Alex Martelli's Bunch class:

class Bunch(object):
    def __init__(self, **kwds):
        self.__dict__.update(kwds)

names = 'John', 'Jane'
for block in Blocks.parse_string(config):
    b = Bunch(**block)
    if b.name in names and b.hleft.upper() in ("INFERNAL1/A', 'HMMER3/F"):
        print ' '.join((b.hleft, b.hright))
        print 'Name', b.name
        print '\n'.join(b.lines)

Given this file:

Infernal1/a ...
Name John
...
...
...
...
//
SomeHeader/a ...
Name Jane
...
...
...
...
//
HMMER3/f ...
Name Jane
...
...
...
...
//
Infernal1/a ...
Name Billy Bob
...
...
...
...
//

the result is:

Infernal1/a ...
Name John
...
...
...
...
HMMER3/f ...
Name Jane
...
...
...
...
pillmuncher
  • 10,094
  • 2
  • 35
  • 33
0

1/ No exception handling

To avoid handling exception StopIteration, you should look at the Pythonic way to deal with sequences (as Abarnert mentionned):

it = iter(cmLines)
for line in it:
    # do

2/ Catching information

Also, you may try to catch your information pattern with regular expressions. You do know the exact expression for first line. Then you want to catch name and compare it against some list of admissible names. Finally, you are looking for next //. You may build a regexp including linebreaks, and use a group to catch the name you want to check,

(...)

Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below. To match the literals '(' or ')', use ( or ), or enclose them inside a character class: [(] [)].

Here is an example on regex use of group in Python doc

>>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
>>> m.group(0)       # The entire match
'Isaac Newton'
>>> m.group(1)       # The first parenthesized subgroup.
'Isaac'
>>> m.group(2)       # The second parenthesized subgroup.
'Newton'
>>> m.group(1, 2)    # Multiple arguments give us a tuple.
('Isaac', 'Newton')

More on Regex.

Link

Iterator next() raising exception in Python: https://softwareengineering.stackexchange.com/questions/112463/why-do-iterators-in-python-raise-an-exception

Community
  • 1
  • 1
kiriloff
  • 25,609
  • 37
  • 148
  • 229
0

You can ignore StopIteration explicitly:

 try:
     # parse file
     it = iter(cmLines)
     for line in it:
         # here `line = next(it)` might raise StopIteration
 except StopIteration:
     pass
 except Exception as e:
     # handle exception

Or call line = next(it, None) and check for None.

To separate concerns, you could split the code in two parts:

  • split input into records:
from collections import deque
from itertools import chain, dropwhile, takewhile

def getrecords(lines):
    it = iter(lines)
    headers = "INFERNAL1/a", "HMMER3/f"
    while True:
        it = chain([next(it)], it) # force StopIteration at the end
        it = dropwhile(lambda line: not line.startswith(headers), it)
        record = takewhile(lambda line: not line.starswith("//"), it)
        yield record
        consume(record) # make sure each record is read to the end

def consume(iterable):
    deque(iterable, maxlen=0)
  • output records that you're interested in:
from contextlib import closing

with closing(output):
    for record in getrecords(cmLines):
        title, line = next(record, ""), next(record, "")
        if word2(line) in namesList:
           for line in chain([title, line], record):
               output.write(line)
jfs
  • 399,953
  • 195
  • 994
  • 1,670