Python "with" statement scope and order of statements

Question

OK, my coding is very rusty so I've been borrowing and adapting from tutorials.

I started playing around with BeautifulSoup opening a file with:

with open('event.html', encoding='utf8') as f:
    soup = bs4.BeautifulSoup(f, "lxml")

Later, I need to find a string in the same file and BS seemed more complicated so I did:

lines = f.readlines()

And put it together with the previous instructions:

with open('event.html', encoding='utf8') as f:
    soup = bs4.BeautifulSoup(f, "lxml")
    lines = f.readlines()

Where I'm puzzled is that if I swap two lines and make that block like below:

with open('event.html', encoding='utf8') as f:
    lines = f.readlines()
    soup = bs4.BeautifulSoup(f, "lxml")

Then the rest of my code will break. Why is it?

because .readlines() advances the file pointer to the end of the file So when BS tries to read the pointer is at the end of the file — Dan-Dev, May 15 '17 at 13:57
so, should I use a different/better method to extract the lines? — greye, May 15 '17 at 14:00
you can reset the pointer to the start of the file as per user3381590 answer or see http://stackoverflow.com/questions/10201008/using-readlines-twice-in-a-row-in-python — Dan-Dev, May 15 '17 at 14:03
order is unimportant for me but I was banging my head wondering why the code wasn't working and then even more confused when I figured out re-ordering that portion "fixed" it... if anyone has a suggestion I'll take it — greye, May 15 '17 at 14:04
strike that. the first one doesn't work, it just doesn't crash the script but len(lines) = 0. I followed the f.seek(0) suggestion and now it is ok. — greye, May 15 '17 at 14:11

score 2 · Accepted Answer · answered May 15 '17 at 14:00

2

The readlines function causes the internal file pointer to point to the end of the file. I haven't used BeautifulSoup myself but I assume they are assuming that the input file is at pointed at the 0th index in the file. Seeking the file to the beginning using f.seek(0) should alleviate that.

with open('event.html', encoding='utf8') as f:
    lines = f.readlines()
    f.seek(0)
    soup = bs4.BeautifulSoup(f, "lxml")

BeautifulSoup is probably reading the file and then setting the file pointer back to where it was after finishing the read, which is why it is working the other way around.

answered May 15 '17 at 14:00

user3381590

68
6

From my tests I believe neither BeautifulSoup nor readlines() set the pointer back. If the other runs first, BS will crash the script but readlines() will simply return empty and move on. Your f.seek(0) fixes this. Thanks! – greye May 15 '17 at 14:12
If BS does not set the pointer back, then `lines` should be an empty list when f.readlines is called. – user3381590 May 15 '17 at 14:13

Python "with" statement scope and order of statements

1 Answers1