Python: Read huge number of lines from stdin

Question

I'm trying to read a huge amount of lines from standard input with python.

more hugefile.txt | python readstdin.py

The problem is that the program freezes as soon as i've read just a single line.

print sys.stdin.read(8)
exit(1)

This prints the first 8 bytes but then i expect it to terminate but it never does. I think it's not really just reading the first bytes but trying to read the whole file into memory.

Same problem with sys.stdin.readline()

What i really want to do is of course to read all the lines but with a buffer so i don't run out of memory.

I'm using python 2.6

Why are you using more instead of cat or even just a simple redirect of stdin? — Mark Byers, Oct 27 '11 at 23:53
I don't see any reason why your program would "freeze". How are you detecting that it freezes? — Greg Hewgill, Oct 27 '11 at 23:56
Ah i was running it from windows command line. No cat on windows. — Martin, Oct 27 '11 at 23:56
I had the same thought as Mark, but then it turns out `more` appears to just act like `cat` when stdout is redirected. Still, `cat` would absolutely be a much better choice here. (I believe it's `type` on Windows, or something like that) — David Z, Oct 27 '11 at 23:56
@GregHewgill If the file is small the program terminates after reading the 8 bytes. With a huge file it's not. — Martin, Oct 27 '11 at 23:58
Oh, who knows what `more` on Windows does (since Windows doesn't distinguish between "is a tty" and "isn't a tty" in the same way as Unix does). Definitely try with redirection (`python readstdin.py < hugefile.text`). — Greg Hewgill, Oct 27 '11 at 23:58
Why did you choose not to use redirection? Using `type` is unnecessary even on Windows. [The purpose of `cat` is to concatenate (or "catenate") files. If it's only one file, concatenating it with nothing at all is a waste of time, and costs you a process.](http://smallo.ruhr.de/award.html) — Greg Hewgill, Oct 28 '11 at 00:00
Redirection worked too. I'm not used to windows command line so i just googled the windows equivalent of "cat" and some site said "more". And it worked fine until i got to this huge file. — Martin, Oct 28 '11 at 00:02
So are we to conclude that "huge" == "more than a page full"? — Mark Ransom, Oct 28 '11 at 00:05

Gringo Suave · Accepted Answer · 2013-02-12T04:44:40.803

11

This should work efficiently in a modern Python:

import sys

for line in sys.stdin:
    # do something...
    print line,

You can then run the script like this:

python readstdin.py < hugefile.txt

edited Feb 12 '13 at 04:44

answered Oct 28 '11 at 03:19

Gringo Suave

29,931
6
88
75

How is the memory working here? It loads one single line at a time and remove it from the buffer when the next line reads in ? Thanks – B.Mr.W. Aug 27 '13 at 17:55
Yes, reads a line at a time and sets it to the line variable. Old values for line will be reclaimed as they are lost. – Gringo Suave Aug 27 '13 at 18:09

score 2 · Answer 2 · answered Oct 28 '11 at 00:09

Back in the day, you had to use xreadlines to get efficient huge line-at-a-time IO -- and the docs now ask that you use for line in file.

Of course, this is of assistance only if you're actually working on the lines one at a time. If you're just reading big binary blobs to pass onto something else, then your other mechanism might be as efficient.

Python: Read huge number of lines from stdin

2 Answers2