4

I want to wait until user input terminates with EOF and then output it all whole. Isn't that what getContents supposed to do? The following code outputs each time user hits enter, what am I doing wrong?

import System.IO

main = do
  hSetBuffering stdin NoBuffering
  contents <- getContents
  putStrLn contents
Cactus
  • 27,075
  • 9
  • 69
  • 149
Kirill Dubovikov
  • 1,457
  • 2
  • 21
  • 41
  • 2
    You'd be better of using [`isEOF`](http://hackage.haskell.org/package/base-4.7.0.1/docs/System-IO.html#v:isEOF), `getContents` lazily returns the contents, which means if your processing is also lazy you'll get chunks of input as it's available. Also, I think you mean `NoBuffering`, not `NoneBuffering`. – bheklilr Feb 05 '15 at 20:22
  • 3
    This won't solve your problem, but I think you meant to set buffering for `stdin`, not `stdout`. – Tikhon Jelvis Feb 05 '15 at 20:23

3 Answers3

12

The fundamental problem is that getContents is an instances of Lazy IO. This means that getContents produces a thunk that can be evaluated like a normal Haskell value, and only does the relevant IO when it's forced.

contents is a lazy list that putStr tries to print, which forces the list and causes getContents to read as much as it can. putStr then prints everything that's forced, and continues trying to force the rest of the list until it hits []. As getContents can read more and more of the stream—the exact behavior depends on buffering—putStr can print more and more of it immediately, giving you the behavior you see.

While this behavior is useful for very simple scripts, it ties in Haskell's evaluation order into observable effects—something it was never meant to do. This means that controlling exactly when parts of contents get printed is awkward because you have to break the normal Haskell abstraction and understand exactly how things are getting evaluated.

This leads to some potentially unintuitive behavior. For example, if you try to get the length of the input—and actually use it—the list is forced before you get to printing it, giving you the behavior you want:

main = do
  contents <- getContents
  let n = length contents
  print n
  putStr contents

but if you move the print n after the putStr, you go back to the original behavior because n does not get forced until after printing the input (even though n still got defined before putStr was used):

main = do
  contents <- getContents
  let n = length contents
  putStr contents
  print n

Normally, this sort of thing is not a problem because it won't change the behavior of your code (although it can affect performance). Lazy IO just brings it into the realm of correctness by piercing the abstraction layer.

This also gives us a hint on how we can fix your issue: we need some way of forcing contents before printing it. As we saw, we can do this with length because length needs to traverse the whole list before computing its result. Instead of printing it, we can use seq which forces the lefthand expression to be evaluated at the same time as the righthand one, but throws away the actual value:

main = do
  contents <- getContents
  let n = length contents
  n `seq` putStr contents

At the same time, this is still a bit ugly because we're using length just to traverse the list, not because we actually care about it. What we would really like is a function that just traverses the list enough to evaluate it, without doing anything else. Happily, this is exactly what deepseq does (for many data structures, not just lists):

import Control.DeepSeq
import System.IO

main = do
  contents <- getContents
  contents `deepseq` putStr contents
Tikhon Jelvis
  • 67,485
  • 18
  • 177
  • 214
7

This is a problem of lazy I/O. One simple solution is to use strict I/O, such as via ByteStrings:

import qualified Data.ByteString as S

main :: IO ()
main = S.getContents >>= S.putStr
Michael Snoyman
  • 31,100
  • 3
  • 48
  • 77
  • Tikhon's answer is far more complete than mine, it should be accepted. I'll leave mine in place since it gives what I think is a more canonical solution to the problem (using strict ByteStrings instead of forcing a lazy thunk). – Michael Snoyman Feb 05 '15 at 21:37
4

You can use the replacement functions from the strict package (link):

import qualified System.IO.Strict as S

main = do
  contents <- S.getContents
  putStrLn contents

Note that for reading there isn't a need to set buffering. Buffering really only helps when writing to files. See this answer (link) for more details.

The definition of the strict version of hGetContents in System.IO.Strict is pretty simple:

hGetContents    :: IO.Handle -> IO.IO String
hGetContents h  = IO.hGetContents h >>= \s -> length s `seq` return s

I.e., it forces everything to read into memory by calling length on the string returned by the standard/lazy version of hGetContents.

Community
  • 1
  • 1
ErikR
  • 51,541
  • 9
  • 73
  • 124