24

I need a very inexpensive way of reading a buffer with no terminating string (a stream) in Python. This is what I have, but it wastes a a lot of CPU time and effort. Because it is constantly "trying and catching." I really need a new approach.

Here is a reduced working version of my code:

#! /usr/bin/env/ python
import fcntl, os, sys

if __name__ == "__main__":
    f = open("/dev/urandom", "r")
    fd = f.fileno()
    fl = fcntl.fcntl(fd, fcntl.F_GETFL)
    fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)

    ready = False
    line = ""
    while True:
        try:
            char = f.read()
            if char == '\r':
                continue
            elif char = '\n':
                ready = True
            else:
                line += char
        except:
            continue
        if ready:
            print line

Don't run this in the terminal. It's simply for illustration. "urandom" will break your terminal because it spits out a lot of random characters that the terminal emulator interprets no matter what (which can change your current shells settings, title, etc). I was reading from a gps connected via usb.

The problem: this uses 100% of the CPU usage when it can. I have tried this:

#! /usr/bin/env/ python
import fcntl, os, sys

if __name__ == "__main__":
    f = open("/dev/urandom", "r")
    fd = f.fileno()
    fl = fcntl.fcntl(fd, fcntl.F_GETFL)
    fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)

    for line in f.readlines():
        print line

However, I get IOError: [Errno 11] Resource temporarily unavailable. I have tried to use Popen amongst other things. I am at a loss. Can someone please provide a solution (and please explain everything, as I am not a pro, per se). Also, I should note that this is for Unix (particularly Linux, but it must be portable across all versions of Linux).

dylnmc
  • 3,810
  • 4
  • 26
  • 42
  • WOOOOW! I literally just found it... new to Python 2.6, though... https://docs.python.org/2/library/io.html – dylnmc Sep 30 '14 at 18:47
  • You'll want to use an unbuffered mode, though. – matsjoyce Sep 30 '14 at 18:48
  • I think it does that for you; ` io.open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True)` – dylnmc Sep 30 '14 at 18:48
  • just to be clear, are you really trying to read urandom or is that a stand-in for a regular file or perhaps a pipe? – tdelaney Sep 30 '14 at 19:15
  • @tdelaney Nope! I'm trying to read a gps puck from /dev/ttyUSB0. It's hidden in the last sentence of the second paragraph. – dylnmc Sep 30 '14 at 19:17
  • The select module may help (you wait on a select call instead of polling constantly). – tdelaney Sep 30 '14 at 19:33

3 Answers3

13

The simple solutions are the best:

with open('/dev/urandom', 'r') as f:
    for line in f:
        print line.encode('hex')  # Don't mess up my terminal

Or, alterantively

with open('/dev/urandom', 'r') as f:
    for line in iter(f.readline, ''):
        print line.encode('hex')  # Don't mess up my terminal

Notes:

  • Leave the file descriptor in blocking mode, so the OS can block your process (and save CPU time) when there is no data available.

  • It is important to use an iterator in the loop. Consider for line in f.readlines():. f.readlines() reads all of the data, puts it all in a list, and returns that list. Since we have infinite data, f.readlines() will never return successfully. In contrast, f returns an iterator -- it only gets as much data as it needs to satisfy the next loop iteration (and just a little more for a performance buffer.)

  • The first version reads ahead and buffers enough data to print several lines. The second version returns each line immediately. Use the first version if conserving CPU is your primary concern. Use the second if interactive response time is your primary concern.

Demonstration:

$ python x.py  | head -2l
eb99f1b3bf74eead42750c63cb7c16160fa7e21c94b176dc6fd2d6796a1428dc8c5d15f13e3c1d5969cb59317eaba37a97f4719bb3de87919009da013fa06ae738408478bc15c750850744a4edcc27d155749d840680bf3a827aafbe9be84e7c8e2fe5785d2305cbedd76454573ca9261ac9a480f71242baa94e8d4bdf761705a6a0fea1ba2b1502066b2538a62776e9165043e5b7337d45773d009fd06d15ca0d9b51af499c1c9d7684472272a7361751d220848874215bc494456b08910e9815fc533d3545129aad4f3f126dc5341266ca4b85ea949794cacaf16409bcd02263b08613190b3f69caa68a47758345dafb10121cfe6ed6c8098142682aef47d1080bd2e218b571824bf2fa5d0bb5297278be8a9a2f55b554631c99e5f1d9040c5bc2bde9a40c8b6e95fc47be6ea9235243582f2367893d15a1494f732d0346ec6184a366f8035aef9141c638128444b1549a64937697b1a170e648d20f336e352076893fa7265c8fa0f4e2207e87410e53b43a51aa146ac6c2decf274a45a58c4e442aececf28879a3e0b4a1278eac7a4f969b3f74e2f2a2064a55ff112c4c49092366dbaa125703962ec5083d09cdb750c0e1dbe34cadda66709f98ff63faccf0045993137bfaca949686bc395bbafb7cf9b5b3475a0c91bdea8cec4e9ac1a9c96e0b81c1c5f242ae72cdea4c073db0351322f9da31203ea34d1b6f298128435797f4846a53b0733069060680dbc2b44c662c4b685ced5419b65c01df41cc2dd9877dc2a97a965174d508a3c9275d8aee7f2991bbb06ca7e0010b0e5b9468aed12f5d2c9a65091223547b8655211df435ffbf24768d48c7e7cf3cb7225f2c116e94a8602078f2b34dab6852f57708e760f88f4085ec7dade19ed558a539f830adea1b81f46303789224802f1f090ec0ff59e291246f1287672b0035df07c359d2ada48e674622f61c0f456c36d130fb6cf7f529e7c4dfceccc594ba5e812a3250e022eca9576a5a8b31c0be13969841d5a4d52b10a7dc8ddd1cac279500cb66e3b244e7d1e042249fd8adf2a90fa8bee74378d79a3d55c6fcf6cc19aa85ffb078dba23ca88ea6810d4a1c5d98b3b33e68ddd41c881df167c36ab2e1b081849781e08e3a026fbd3755acf9f215e0402cbf1a021300f5c883f86a05d467479172109a8f20f93c8c255915a264463eb113c3e8d07d0cec31aa8c0f978a0e7e65c142e85383befd6679c69edd2c56599f15580bbb356d98cfdf012dbc6d1dd6c0dbcfe6f8235d3d5c015fb94d8cc29afdf5d69e33d0e5078d651782546bc2acccab9f35e595f0951a139526ae5651a3ebbec353e99f9ddd1615ed25529500dabe8bf6f12ee6b21a437caca12a6d9688986d94fb7c103dca1572350900e56276b857630a
9d024ef4454dcd5e35dd605a2d49c26ce44fae87ab33e7a158d328521c7d77969908ec5b67f01bf8e2c330dcb70b5f3def8e6d4b010c6d31e4cbe7478657782f10b6fc2d77e8ff7a2f1e590827827e1037b33b0a
Traceback (most recent call last):
  File "x.py", line 4, in <module>
    print line.encode('hex')  # Don't mess up my terminal
IOError: [Errno 32] Broken pipe
Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • Oh, sorry; did I mention that I need it to be unblocked? That blocks until EOF has been reached. – dylnmc Sep 30 '14 at 19:04
  • No, this blocks until end-of-line has been reached, then returns control to the body of the `for` loop. The key difference between `for line in f.readlines()` and `for line i f` is precisely that: the former blocks until EOF has been reached, the latter blocks until end-of-line has been reached. – Robᵩ Sep 30 '14 at 19:07
  • Yeah; I cannot have it block. I will be reading from a gps puck on a headless computer and will not be able to "tell" the program to stop. Also, you forgot a `try: except (KeyboardInterrupt):` if you even want to be able to read those lines. – dylnmc Sep 30 '14 at 19:08
  • My program has no need for a `try: ... except KeyboardInterrupt: ...`. If I want to read the lines, I'll redirect them or pipe them to `less.` As for the ability to make the program stop, perhaps you could add some more details to your question about what you are actually trying to achieve. – Robᵩ Sep 30 '14 at 19:13
  • We are talking about a file that never ends (in Unix at least). A stream has no terminating EOF, so this will continue forever... that is, until you hit `ctrl + c`. Then this will terminate the program and you will never print your lines. – dylnmc Sep 30 '14 at 19:14
  • This works (for me at least) with urandom, but not with the gps I am trying to read from – dylnmc Sep 30 '14 at 19:16
  • This probably works with urandom because some of the random characters are EOF – dylnmc Sep 30 '14 at 19:16
  • what about for files that only output ascii – dylnmc Sep 30 '14 at 19:17
  • 1
    1) To get it to work with GPS, try the 2nd example (I just added it). 2) For files that only output ascii, replace `print line.encode('hex')` with `print line`. 3) It works for urandom because urandom always has more data ready. (There is no such thing as an EOF character. EOF is a condition, not a character value.) – Robᵩ Sep 30 '14 at 19:19
  • Oh! I wish it were that simple. Like I said; it **doesn't** work. – dylnmc Sep 30 '14 at 19:19
  • code: `#! /usr/bin/env/ python if __name__ == "__main__": with open('/dev/ttyUSB0', 'r') as f: for line in f: print line` doesn't work! – dylnmc Sep 30 '14 at 19:21
  • Only one problem: you can't do a `try: readline except: pass` (which is necessary in most cases because you start in mid stream and might get a wonky byte that causes an error (which happened to me a couple of times with the gps stream - actually on the first try). – dylnmc Sep 30 '14 at 19:41
13

You will want to set your buffering mode to the size of the chunk you want to read when opening the file stream. From python documentation:

io.open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True)

"buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size of a fixed-size chunk buffer."

You also want to use the readable() method in the while loop, to avoid unnecessary resource consumption.

However, I advise you to use buffered streams such as io.BytesIO or io.BufferedReader

More info in the docs.

Anoyz
  • 7,431
  • 3
  • 30
  • 35
  • That's what I decided to use. I think io.open should be sufficient because I am just reading text from a non-blocking file (stream). I don't really know much about buffering execept for `BufferedReader` in Java. If I know that the line I will be reading will not be more than `n` characters, should I do `f = io.open("/dev/urandom", mode="r", buffering=n)`? – dylnmc Sep 30 '14 at 19:12
  • Also, someone commented that I want to use unbuffered mode, and I think they are correct! – dylnmc Sep 30 '14 at 19:14
0

I decided to use io. I noticed that this is much more accurate than even a while True:. The gps that I am reading from is supposed to spit out info every second, but I noticed it was really anywhere from .95 to 1.05 secs. That was when I was doing what I posted in my question.

However, when I simply do

#! /usr/bin/env/ python

import io

if __name__ == "__main__":
    f = io.open("/dev/ttyUSB0")
    while True:
        print f.readline().strip()

It not only temporarily blocks (which save cpu time, and does all sorts of good), but it also apparently keeps the buffer extremely up to date because it seems to produce results almost exactly one second apart (which is when my gps - like most - updates).

A true miracle that class is - a true miracle - that is if it were the only way to do it like this. One could just use open(file, "r"), and it works fine (which angers me because I spent quite an entire day on this).

dylnmc
  • 3,810
  • 4
  • 26
  • 42
  • Does this program work equally well if you replace `io.open` with `open`? – Robᵩ Sep 30 '14 at 19:42
  • 1
    Oh my gosh; it actually does. I was 99% sure I already tried that. – dylnmc Sep 30 '14 at 19:46
  • I **did** try that, **but** i used `f = open("/dev/ttyUSB0", "r")`, `fd = f.fileno()`, `fl = fcntl.fcntl(fd, fcntl.F_GETFL)`, `fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)` then the while true... readline which did **not** work.[ – dylnmc Sep 30 '14 at 19:48
  • 3
    `while True: line = f.readline()` is more-or-less equivalent to `for line in iter(f.readline, ''): pass`. Each one calls `f.readline()` in an infinite loop. (The difference, as you've pointed out, is the ability to wrap `f.readline()` in a `try: except:`). – Robᵩ Sep 30 '14 at 19:52