9

How do I check for EOF in Python? I found a bug in my code where the last block of text after the separator isn't added to the return list. Or maybe there's a better way of expressing this function?

Here's my code:

def get_text_blocks(filename):
    text_blocks = []
    text_block = StringIO.StringIO()
    with open(filename, 'r') as f:
        for line in f:
            text_block.write(line)
            print line
            if line.startswith('-- -'):
                text_blocks.append(text_block.getvalue())
                text_block.close()
                text_block = StringIO.StringIO()
    return text_blocks
ajushi
  • 1,237
  • 4
  • 16
  • 28

5 Answers5

2

You might find it easier to solve this using itertools.groupby.

def get_text_blocks(filename):
    import itertools
    with open(filename,'r') as f:
        groups = itertools.groupby(f, lambda line:line.startswith('-- -'))
        return [''.join(lines) for is_separator, lines in groups if not is_separator]

Another alternative is to use a regular expression to match the separators:

def get_text_blocks(filename):
    import re
    seperator = re.compile('^-- -.*', re.M)
    with open(filename,'r') as f:
        return re.split(seperator, f.read())
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • Interesting answers Mark. I didn't know about itertools, thanks. – ajushi Jan 03 '10 at 04:31
  • +1 For RegEx version, the itertools version is slightly cryptic. – Maiku Mori Jan 03 '10 at 04:40
  • I tried the itertools version on the ineractive interpreter and it returns an empty string. lines seems to be an itertools._grouper object – ajushi Jan 03 '10 at 04:44
  • It's unlikely to return an empty string. It always returns a list. You must have a copy/paste error. – Mark Byers Jan 03 '10 at 04:50
  • Sorry my bad, an empty list I mean. – ajushi Jan 03 '10 at 04:53
  • Well all I can say is that it works here for the files I tested it on. Maybe you gave it an empty file, or a file where every line was a separator? I can't really explain it without more details. You can just use the regex method (the second alternative) if you can't get the first working (though I suspect that whatever you are doing wrong with the first method will also cause problems with the second). – Mark Byers Jan 03 '10 at 04:57
  • You're right the file object is empty because I've iterated through it, may bad again. Anyway thank you for itertools :) – ajushi Jan 03 '10 at 05:02
1

The end-of-file condition holds as soon as the for statement terminates -- that seems the simplest way to minorly fix this code (you can extract text_block.getvalue() at the end if you want to check it's not empty before appending it).

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • Thanks Alex! My dirty solution was to add text_blocks.append(text_block.getvalue()) and text_block.close() below the for block. It works but it's not DRY :/ – ajushi Jan 03 '10 at 04:47
1

This is the standard problem with emitting buffers.

You don't detect EOF -- that's needless. You write the last buffer.

def get_text_blocks(filename):
    text_blocks = []
    text_block = StringIO.StringIO()
    with open(filename, 'r') as f:
        for line in f:
            text_block.write(line)
            print line
            if line.startswith('-- -'):
                text_blocks.append(text_block.getvalue())
                text_block.close()
                text_block = StringIO.StringIO()
         ### At this moment, you are at EOF
         if len(text_block) > 0:
             text_blocks.append( text_block.getvalue() )
         ### Now your final block (if any) is appended.
    return text_blocks
S.Lott
  • 384,516
  • 81
  • 508
  • 779
-1

Why do you need StringIO here?

def get_text_blocks(filename):
    text_blocks = [""]
    with open(filename, 'r') as f:
        for line in f:
            if line.startswith('-- -'):
                text_blocks.append(line)
            else: text_blocks[-1] += line          
    return text_blocks

EDIT: Fixed the function, other suggestions might be better, just wanted to write a function similar to the original one.

EDIT: Assumed the file starts with "-- -", by adding empty string to the list you can "fix" the IndexError or you could use this one:

def get_text_blocks(filename):
    text_blocks = []
    with open(filename, 'r') as f:
        for line in f:
            if line.startswith('-- -'):
                text_blocks.append(line)
            else:
                if len(text_blocks) != 0:
                    text_blocks[-1] += line          
    return text_blocks

But both versions look a bit ugly to me, the reg-ex version is much more cleaner.

Maiku Mori
  • 7,419
  • 2
  • 40
  • 52
-2

This is a fast way to see if you have an empty file:

if f.read(1) == '': 
 print "EOF"
 f.close()
octopusgrabbus
  • 10,555
  • 15
  • 68
  • 131
AndroidDebaser
  • 369
  • 3
  • 5
  • No, because there is no space between the ''. I tested this on a file with just a space, and it didn't detect that the file was empty. – AndroidDebaser Apr 23 '13 at 18:46
  • 2
    If a file contains a space it isn't empty. – Dave Jul 04 '14 at 01:30
  • AndroidDebaser: this is an incomplete answer. `f.read(1)` will consume 1 character (your single space) so it needs to be in a loop, something like `while f.read(1) != '':` would iterate until there is nothing left to iterate on. – Gary Howe Mar 02 '22 at 20:16