12

In order not to block the reactor I would like to read files asynchronously, but I've found no obvious way of doing it using EventMachine. I've tried a few different approaches, but none of them feels right:

  • Just read the file, it'll block the reactor, but what the hell, it's not that slow (unless it's a big file, and then it definitely is).
  • Open the file for reading and read a chunk on each tick (but how much to read? too much and it'll block the reactor, too little and reading will get slower than necessary).
  • EM.popen('cat some/file', FileReader) feels really weird, but works better than the alternatives above. In combination with the LineAndTextProtocol it reads lines pretty swiftly.
  • EM.attach, but I haven't found any examples of how to use it, and the only thing I've found on the mailing list is that it's deprecated in favour of…
  • EM.watch, which I've found no examples of how to use for reading files.

How do you read files within a EventMachine reactor loop?

Theo
  • 131,503
  • 21
  • 160
  • 205

1 Answers1

6

EM.attach/watch cannot be used on files, as select/epoll on a disk-based file descriptor will always return readable.

Ultimately, it depends on what you're trying to do. If it's a small file, just File.read it. If it is larger, you can read small chunks over time. For example, EM::FileStreamer does this to send large file over the network.

Another common use-case is to tail a file and read in new contents when it changes. This can be achieved using EM.watch_file: http://github.com/jordansissel/eventmachine-tail

tmm1
  • 2,025
  • 1
  • 20
  • 35
  • Basically I want to read a few moderately large files (up to 10 Mb) in parallel and extract a piece of each line. – Theo May 05 '10 at 18:48
  • If the operation you need to perform is per-line, then reading a line of the file on each tick seems to make the most sense. You'd get the benefit of all of Ruby's line-based IO methods, your event blocks would most closely reflect your business logic, and doing less in each block simply means the ticks would happen faster. – SFEley Jun 04 '10 at 18:04
  • Reading a line on each tick is too slow because I spend time inside the reactor waiting for IO, and that's what I want to avoid, I want to do other things (like process the line) while waiting for IO. – Theo Jun 04 '10 at 20:39