Passing around values that represent "the world" is one way to make a pure model for doing IO (and other side effects) in pure declarative programming.
The "problem" with pure declarative (not just functional) programming is obvious. Pure declarative programming provides a model of computation. These models can express any possible computation, but in the real world we use programs to have computers do things that aren't computation in a theoretical sense: taking input, rendering to displays, reading and writing storage, using networks, controlling robots, etc, etc. You can directly model almost all of such programs as computation (e.g. what output should be written to a file given this input is a computation), but the actual interactions with things outside the program just isn't part of the pure model.
That's actually true of imperative programming too. The "model" of computation that is the C programming language provides no way to write to files, read from keyboards, or anything. But the solution in imperative programming is trivial. Performing a computation in the imperative model is executing a sequences of instructions, and what each instruction actually does depends on the whole environment of the program at the time it is executed. So you can just provide "magic" instructions that carry out your IO actions when they are executed. And since imperative programmers are used to thinking about their programs operationally1, this fits very naturally with what they're already doing.
But in all pure models of computation, what a given unit of computation (function, predicate, etc) will do should only depend on its inputs, not on some arbitrary environment that can be different every time. So not only performing IO actions but also implementing computations which depend on the universe outside the program is impossible.
The idea for the solution is fairly simple though. You build a model for how IO actions work within the whole pure model of computation. Then all the principles and theories that apply to the pure model in general will also apply to the part of it that models IO. Then, within the language or library implementation (because it's not expressible in the language itself), you hook up manipulations of the IO model to actual IO actions.
This brings us to passing around a value that represents the world. For example, a "hello world" program in Mercury looks like this:
:- pred main(io::di, io::uo) is det.
main(InitialWorld, FinalWorld) :-
print("Hello world!", InitialWorld, TmpWorld),
nl(TmpWorld, FinalWorld).
The program is given InitialWorld
, a value in the type io
which represents the entire universe outside the program. It passes this world to print
, which gives it back TmpWorld
, the world that is like InitialWorld
but in which "Hello world!" has been printed to the terminal, and whatever else has happened in the meantime since InitialWorld
was passed to main
is also incorporated. It then passes TmpWorld
to nl
, which gives back FinalWorld
(a world that is very like TmpWorld
but it incorporates the printing of the newline, plus any other effects that happened in the meantime). FinalWorld
is the final state of the world passed out of main
back to the operating system.
Of course, we're not really passing around the entire universe as a value in the program. In the underlying implementation there usually isn't a value of type io
at all, because there's no information that's useful to actually pass around; it all exists outside the program. But using the model where we pass around io
values allows us to program as if the entire universe was an input and output of every operation that is affected by it (and consequently see that any operation that doesn't take an input and output io
argument can't be affected by the external world).
And in fact, usually you wouldn't actually even think of programs that do IO as if they're passing around the universe. In real Mercury code you'd use the "state variable" syntactic sugar, and write the above program like this:
:- pred main(io::di, io::uo) is det.
main(!IO) :-
print("Hello world!", !IO),
nl(!IO).
The exclamation point syntax signifies that !IO
really stands for two arguments, IO_X
and IO_Y
, where the X
and Y
parts are automatically filled in by the compiler such that the state variable is "threaded" through the goals in the order in which they are written. This is not just useful in the context of IO btw, state variables are really handy syntactic sugar to have in Mercury.
So the programmer actually tends to think of this as a sequence of steps (depending on and affecting external state) that are executed in the order in which they are written. !IO
almost becomes a magic tag that just marks the calls to which this applies.
In Haskell, the pure model for IO is a monad, and a "hello world" program looks like this:
main :: IO ()
main = putStrLn "Hello world!"
One way to interpret the IO
monad is similarly to the State
monad; it's automatically threading a state value through, and every value in the monad can depend on or affect this state. Only in the case of IO
the state being threaded is the entire universe, as in the Mercury program. With Mercury's state variables and Haskell's do notation, the two approaches end up looking quite similar, with the "world" automatically threaded through in a way that respects the order in which the calls were written in the source code, =but still having IO
actions explicitly marked.
As explained quite well in sacundim
's answer, another way to interpret Haskell's IO
monad as a model for IO-y computations is to imagine that putStrLn "Hello world!"
isn't in fact a computation through which "the universe" needs to be threaded, but rather that putStrLn "Hello World!"
is itself a data structure describing an IO action that could be taken. On this understanding what programs in the IO
monad are doing is using pure Haskell programs to generate at runtime an imperative program. In pure Haskell there's no way to actually execute that program, but since main
is of type IO ()
main
itself evaluates to such a program, and we just know operationally that the Haskell runtime will execute the main
program.
Since we're hooking up these pure models of IO to actual interactions with the outside world, we need to be a little careful. We're programming as if the entire universe was a value we can pass around the same as other values. But other values can be passed into multiple different calls, stored in polymorphic containers, and many other things that don't make any sense in terms of the actual universe. So we need some restrictions that prevent us from doing anything with "the world" in the model that doesn't correspond to anything that can actually be done to the real world.
The approach taken in Mercury is to use unique modes to enforce that the io
value remains unique. That's why the input and output world were declared as io::di
and io::uo
respectively; it's a shorthand for declaring that the type of the first paramter is io
and it's mode is di
(short for "destructive input"), while the type of the second parameter is io
and its mode is uo
(short for "unique output"). Since io
is an abstract type, there's no way to construct new ones, so the only way to meet the uniqueness requirement is to always pass the io
value to at most one call, which must also give you back a unique io
value, and then to output the final io
value from the last thing you call.
The approach taken in Haskell is to use the monad interface to allow values in the IO
monad to be constructed from pure data and from other IO
values, but not expose any functions on IO
values that would allow you to "extract" pure data from the IO
monad. This means that only the IO
values incorporated into main
will ever do anything, and those actions must be correctly sequenced.
I mentioned before that programmers doing IO
in a pure language still tend to think operationally about most of their IO. So why go to all this trouble to come up with a pure model for IO if we're only going to think about it the same way imperative programmers do? The big advantage is that now all the theories/code/whatever that apply to all of the language apply to IO code as well.
For example, in Mercury the equivalent of fold
processes a list element-by-element to build up an accumulator value, which means fold
takes an input/output pair of variables of some arbitrary type as the accumulator (this is a very common pattern in the Mercury standard library, and is why I said state variable syntax often turns out to be very handy in other contexts than IO). Since "the world" appears in Mercury programs explicitly as a value in the type io
, it's possible to use io
values as the accumulator! Printing a list of strings in Mercury is as simple as foldl(print, MyStrings, !IO)
. Similarly in Haskell, generic monad/functor code works just fine on IO
values. We get a whole lot of "higher-order" IO operations that would have to be implemented anew specialised to IO in a language that handles IO by some completely special mechanism.
Also, since we avoid breaking the pure model by IO, theories that are true of the computational model remain true even in the presence of IO. This makes reasoning by the programmer and by program-analysis tools not have to consider whether IO might be involved. In languages like Scala for example, even though much "normal" code is in fact pure, optimizations and implementation techniques that work on pure code are generally inapplicable, because the compiler has to presume that every single call might contain IO or other effects.
1 Thinking about programs operationally means understanding them in terms of the operations the computer will carry out when executing them.