1

I'm trying to read a 100gb file through stdin one line at a time using

Port = open_port({fd, 0, 1}, [in, binary, {line, 4096}]),

but this floods my system with messages until I run out of ram. Is there a away to make it like {active, once} with ports? There is also io:get_line() but I was wondering if this could work.

ForeverConfused
  • 1,607
  • 3
  • 26
  • 41

1 Answers1

1

No, there is not flow control over ports so if you can't process fast enough you should use another method of processing. You can set binary mode on STDIN using

ok = io:setopts(standard_io, [binary]),

and then you can read it using file:read_line(standard_io) if you are using version 17 or newer (there was performance impacting bug).

Hynek -Pichi- Vychodil
  • 26,174
  • 5
  • 52
  • 73
  • Hey Hynek, thank you this does work. The jsons I'm parsing are 100k or larger in size and read_line seems to be returning before finding a new line, probably because it's exceeding the buffer size. Setting things like {readahead, X} returns an error it's not supported in standard_io. Any way around this? – ForeverConfused Jun 12 '16 at 19:17
  • @ForeverConfused: I don't know if `readahead` is possible to be set for `standard_io`. You can make workaround checking presence end of line and concatenating unfinished lines. It will perform copying in memory but I don't know any other solution. – Hynek -Pichi- Vychodil Jun 13 '16 at 15:46
  • I tried that, file:get_chars and doing the buffer myself. But even if I only buffer 1000 lines at once it seems to cause memory overflow for some reason. I ended up having to pipe stdin to netcat then use erlang with {active,once} – ForeverConfused Jun 13 '16 at 18:32
  • @ForeverConfused: Are you using binary? – Hynek -Pichi- Vychodil Jun 14 '16 at 06:54