7

I'm trying to read a huge amount of lines from standard input with python.

more hugefile.txt | python readstdin.py

The problem is that the program freezes as soon as i've read just a single line.

print sys.stdin.read(8)
exit(1)

This prints the first 8 bytes but then i expect it to terminate but it never does. I think it's not really just reading the first bytes but trying to read the whole file into memory.

Same problem with sys.stdin.readline()

What i really want to do is of course to read all the lines but with a buffer so i don't run out of memory.

I'm using python 2.6

Martin
  • 5,197
  • 11
  • 45
  • 60
  • 4
    Why are you using more instead of cat or even just a simple redirect of stdin? – Mark Byers Oct 27 '11 at 23:53
  • I don't see any reason why your program would "freeze". How are you detecting that it freezes? – Greg Hewgill Oct 27 '11 at 23:56
  • Ah i was running it from windows command line. No cat on windows. – Martin Oct 27 '11 at 23:56
  • I had the same thought as Mark, but then it turns out `more` appears to just act like `cat` when stdout is redirected. Still, `cat` would absolutely be a much better choice here. (I believe it's `type` on Windows, or something like that) – David Z Oct 27 '11 at 23:56
  • @GregHewgill If the file is small the program terminates after reading the 8 bytes. With a huge file it's not. – Martin Oct 27 '11 at 23:58
  • Oh, who knows what `more` on Windows does (since Windows doesn't distinguish between "is a tty" and "isn't a tty" in the same way as Unix does). Definitely try with redirection (`python readstdin.py < hugefile.text`). – Greg Hewgill Oct 27 '11 at 23:58
  • Ah yeah "type" instead of "more" solved it! Thanks! – Martin Oct 27 '11 at 23:59
  • 3
    Why did you choose not to use redirection? Using `type` is unnecessary even on Windows. [The purpose of `cat` is to concatenate (or "catenate") files. If it's only one file, concatenating it with nothing at all is a waste of time, and costs you a process.](http://smallo.ruhr.de/award.html) – Greg Hewgill Oct 28 '11 at 00:00
  • Redirection worked too. I'm not used to windows command line so i just googled the windows equivalent of "cat" and some site said "more". And it worked fine until i got to this huge file. – Martin Oct 28 '11 at 00:02
  • So are we to conclude that "huge" == "more than a page full"? – Mark Ransom Oct 28 '11 at 00:05

2 Answers2

11

This should work efficiently in a modern Python:

import sys

for line in sys.stdin:
    # do something...
    print line,

You can then run the script like this:

python readstdin.py < hugefile.txt
Gringo Suave
  • 29,931
  • 6
  • 88
  • 75
  • How is the memory working here? It loads one single line at a time and remove it from the buffer when the next line reads in ? Thanks – B.Mr.W. Aug 27 '13 at 17:55
  • Yes, reads a line at a time and sets it to the line variable. Old values for line will be reclaimed as they are lost. – Gringo Suave Aug 27 '13 at 18:09
2

Back in the day, you had to use xreadlines to get efficient huge line-at-a-time IO -- and the docs now ask that you use for line in file.

Of course, this is of assistance only if you're actually working on the lines one at a time. If you're just reading big binary blobs to pass onto something else, then your other mechanism might be as efficient.

sarnold
  • 102,305
  • 22
  • 181
  • 238