Haskell: Handling deadlocked self-referential lists

Question

Is there any useful reason why the GHC allows the following to block forever:

list = 1 : tail list

It seems with a bit of sophistication in the list iterator/generator we should be able to do something more useful:

Return error "Infinitely blocking list"
Return [1,1]

Explaining 2: it seems possible that when entering the generator to get element N, we could then make all self references inside the generator limited to the list but ending at N-1 (we notice the read N inside the scope generate N and return the end-of-list). It's a sort of simple deadlock detection using scopes.

Clearly this isn't that useful for the toy example above, but it may allow for more useful/elegant finite, self-referential list definitions, for example:

primes = filter (\x -> none ((==0).mod x) primes) [2..]

Note that either change should only affect list generators that would currently result in an infinite-block, so they seem backward compatible language changes.

Ignoring the GHC-complexity required to make such a change for a moment, would this behavior break any existing language behavior that I am missing? Any other thoughts on the "elegance" of this change?

Also see another BFS example that could benefit below. To me, this seems more functional/elegant than some other solutions, since I am only needing to define what a bfsList is, not how to generate it (i.e specifying a terminating condition):

bfs :: (a -> Bool) -> (a -> [a]) -> [a] -> Maybe a
bfs predf expandf xs = find predf bfsList
    where bfsList = xs ++ concatMap expandf bfsList

Whilst this may be simple for this case, in general you're asking the compiler to solve the halting problem. But in any case, why would the compiler not represent the list as the code describes it? Returning `[1,1]` in that example is completely spurious, since, for instance `[1,32132132]` is equally valid. — AJF, Sep 28 '17 at 20:56
given `list = 1 : tail list` i.e. `list = 1 : t where t = tail list`, what is `t`? It's `t = tail list = tail (1 : t) = t`. Even from equational reasoning `t = t` is trivially valid for any value of `t`, so it really doesn't say anything about `t` at all, so there would be no justification to give it any specific value. Trying to print `list` could result in `[1,***ERROR: black hole detected***` or `[1,` and getting stuck in an infinite loop. Closing the list as `[1]` would mean `t = []`, and this too is unjustified (so, wrong thing to do, from equational reasoning standpoint). — Will Ness, Sep 28 '17 at 21:22
`list = 1 : tail list` doesn't block anything in haskell, nor ghc haskell..it's a perfectly valid definition of a value with well-defined operations on it that work and are useful. — Justin L., Sep 28 '17 at 21:33
Solving recursive equations means taking a least fixed point for a suitable function in some domain. Here least means "least defined": anything more defined than that essentially involves inventing values out of this air. E.g. solving `let (a,b,c,d) = (b,1,d,c)` must pick `a=b=1`, but has infinitely many choices for `c,d`. Since we can't solve the halting problem in general, the only sensible choice is `c=d=bottom` where `bottom` is non termination/infinite recursion/infinite loop. — chi, Sep 28 '17 at 22:47

luqui · Answer 1 · 2017-09-29T02:28:03.637

Here is a denotational perspective on how list = 1 : ⊥.

First, a little background. In Haskell, values are partially ordered by "definedness", where values inolving &bot; ("bottom") are less-defined than ones without. So

⊥ is less defined than 1 : ⊥
1 : ⊥ is less defined than 1 : 2 : 3 : []

But it's a partial order, so

1 : ⊥ is not less defined than 2 : 3 : ⊥, nor is it more defined.

even though the second list is longer. 1 : ⊥ is only less defined than lists that start with 1. I highly recommend reading about denotational semantics of Haskell.

Now to your question. Look at

list = 1 : tail list

as an equation to be solved instead of a "function declaration". We rewrite it like this:

list = ((1 :) . tail) list

Viewing it this way, we see that list is a fixed point

list = f list

where f = (1 :) . tail. In Haskell semantics, recursive values are solved by finding the least fixed point according to the above ordering.

The way to find this is very simple. If you start with ⊥, and then apply the function over and over, and you will find an increasing chain of values. The point at which the chain stops changing will be the least fixed point (technically the it will be the limit of the chain, since it might not ever stop changing).

Starting with ⊥,

f ⊥ = ((1 :) . tail) ⊥ = 1 : tail ⊥

we see that ⊥ is not already a fixed point because we didn't get ⊥ out the other end. So let's try again with what we got out:

f (1 : tail ⊥) = ((1 :) . tail) (1 : tail ⊥)
               = 1 : tail (1 : tail ⊥)
               = 1 : tail ⊥

Oh look, it's a fixed point, we got the same thing out that we put in.

The important point here is that it's the least one. Your solution [1,1] = 1:1:[] is also a fixed point, so it solves the equation:

f (1:1:[]) = ((1 :) . tail) (1:1:[]) 
           = 1 : tail (1:1:[])
           = 1:1:[]

But of course, every list that starts with 1 is a solution, and it's unclear how we should choose between them. However, the one we found by recursion 1:⊥ is less defined than all of them, it delivers no more information than required by the equation, and that is the one that is specified by the language.

K. A. Buhr · Answer 2 · 2017-09-29T20:34:47.207

Even though list loops forever under GHCi, a proper binary compiled with GHC does detect the loop and signals an error. If you compile and run:

list = 1 : tail list
main = print list

it terminates with the error message:

Loop: <<loop>>

It does the same thing with your primes examples.

As others have noted, GHC doesn't detect all possible loops. It if did, then it would solve the Halting Problem, and that would probably make Haskell much more popular.

The reason it returns an error (or "gets stuck") instead of returning [1,1] is because the expression:

list = 1 : tail list

has well defined semantics in the Haskell language. These semantics assign it a value, and this value is "bottom" (or "error" or the symbol _|_), just as surely as the value of head [1,2,3] is 1.

(Well, technically, the value of list is 1 : _|_ which is "almost bottom". This is what @Justin Li was talking about in his comment. I've tried to give an explanation of why it has this value below.)

Though you may not see the use of a program or an expression that returns bottom and not see the harm in assigning non-bottom semantics to such expressions on the basis that it is "backwards compatible", most people in the Haskell community (the language designers, compiler developers, and experienced users) will disagree with you, so don't expect to make much progress with them.

As for the specific new semantics you are proposing, they are unclear. Why isn't the value of list equal to [1]? It seems to me that when I am entering the "generator" to get element n=1 (zero indexed, so the second element) and evaluate tail list, then the list ending at element n-1=0 is [1] which has tail equal to [], so I think I should get the following, right?

list = 1 : tail list
     = 1 : tail [1]   -- use list-so-far
     = 1 : []
     = [1]

Why the value is (almost) bottom

Here's why the value of list is (almost) bottom, according to the semantics of standard Haskell (but see note at the end).

For reference, the definition of tail is, effectively:

tail l = case l of _:xs -> xs
                   [] -> error "ack, you dummy!"

Let's try to "fully" evaluate list using Haskell semantics:

-- evaluating `list` using definition of `list`
list = 1 : tail list

-- evaluating `tail list` using definition of `tail`
list = 1 : case list of _:xs -> xs
                        ...
-- evaluating case construct requires matching `list` to
-- a pattern, this requires evaluation of `list` using its defn
list = 1 : case (1 : tail list) of _:xs -> xs
                                   ...
-- case pattern match succeeds
list = 1 : let xs = tail list in xs    -- just to be clear
     = 1 : tail list

-- awesome, now all we need to do is evaluate:
list = 1 : tail list
-- ummm, Houston, we have a problem

and that infinite loop at the end is why the expression is "almost bottom".

Note: There are actually several different sets of Haskell semantics, different methods of calculating the values of Haskell expressions. The gold standard are the denotational semantics described in @luqui's answer. The ones I'm using above are, at best, a form of the "informal semantics" described in the Haskell report, but they're good enough to get the right answer.

Haskell: Handling deadlocked self-referential lists

2 Answers2

Why the value is (almost) bottom

Linked