Why can't GHC reason about some infinite lists?

Question

This recent question got me thinking about Haskell's ability to work with infinite lists. There are plenty of other questions and answers about infinite lists on StackOverflow, and I understand why we can't have a general solution for all infinite lists, but why can't Haskell reason about some infinite lists?

Let's use the example from the first linked question:

list1 = [1..]
list2 = [x | x <- list1, x <= 4]
print list2
$ [1,2,3,4

@user2297560 writes in the comments:

Pretend you're GHCI. Your user gives you an infinite list and asks you to find all the values in that list that are less than or equal to 4. How would you go about doing it? (Keep in mind that you don't know that the list is in order.)

In this case, the user didn't give you an infinite list. GHC generated it! In fact, it generated it following it's own rules. The Haskell 2010 Standard states the following:

enumFrom       :: a -> [a]            -- [n..]

For the types Int and Integer, the enumeration functions have the following meaning:

The sequence enumFrom e1 is the list [e1,e1 + 1,e1 + 2,…].

In his answer to the other question, @chepner writes:

You know that the list is monotonically increasing, but Haskell does not.

The statements these users made don't seem to line up with the standard to me. Haskell created the list in an ordered fashion using a monotonic increase. Haskell should know that the list is both ordered and monotonic. So why can't it reason about this infinite list to turn [x | x <- list1, x <= 4] into takeWhile (<= 4) list1 automatically?

Good q. Just wanted to link this in here. http://stackoverflow.com/questions/40145318 — mike3996, Feb 23 '17 at 16:27
Haskell won't infer anything unless the developers of the language program it to make the inference. There is no free lunch. Obviously you *could* design the language so that certain lists are automatically marked as being increasing, but that would increase the complexity of what is already a pretty complicated language. — John Coleman, Feb 23 '17 at 16:32
@JohnColeman That's fair, it just seems like a logical jump to me in managing infinite lists. I assumed the designers/developers of Haskell are much more proficient in language design/abstract mathematics and there was a good theoretical reason as to why this hasn't been done rather than a pragmatic one. — jkeuhlen, Feb 23 '17 at 16:34
List comprehensions work uniformly for all types of lists and do not take into account any further structure (i.e. the fact that `[1..]` has type `Enum t, Num t => [t]` instead of just `[t]`). Secondly the predicate `x <= 4` is opaque to the filtering mechanism and is just a function `t -> Bool` so no information about its behaviour is available as the list elements increase. — Lee, Feb 23 '17 at 16:39
As others mentioned, once a list is created, the information about _how_ it was created is lost. You need to preserve it somehow; the typical way is to create your own type that preserves the ordering information. Make your type conform to the `List` typeclass, and you could use it everywhere where a normal `Ord a => List a` fits. — 9000, Feb 23 '17 at 17:00

Alec · Accepted Answer · 2017-02-24T01:13:11.987

Theoretically, one could imagine a rewrite rule such as

{-# RULES
  "filterEnumFrom" forall (n :: Int) (m :: Int).
                     filter (< n) (enumFrom m) = [m..(n-1)]
  #-}

And that automatically would convert expressions such as filter (< 4) (enumFrom 1) to [1..3]. So it is possible. There is a glaring problem though: any variation from this exact syntactical pattern won't work. The result is that you end up defining a bunch of rules and you can longer ever be sure if they are triggering or not. If you can't rely on the rules, you eventually just don't use them. (Also, note I've specialized the rule to Ints - as was briefly posted as a comment, this may break down in subtle ways for other types.)

At the end of the day, to perform more advanced analysis, GHC would have to have some tracking information attached to lists to say how they were generated. That would either make lists less lightweight of an abstraction or mean that GHC would have some special machinery in it just for optimizing lists at compile time. Neither of these options is nice.

That said, you can always add your own tracking information by making a list type on top of lists.

data List a where
  EnumFromTo :: Enum a => a -> Maybe a -> List a
  Filter :: (a -> Bool) -> List a -> List a 
  Unstructured :: [a] -> List a

This may end up being easier to optimize.

Benjamin Hodgson · Answer 2 · 2017-02-23T17:37:25.500

So why can't it reason about this infinite list to turn [x | x <- list1, x <= 4] into takeWhile (<= 4) list1 automatically?

The answer isn't any more specific than "It doesn't use takeWhile because it doesn't use takeWhile". The spec says:

Translation: List comprehensions satisfy these identities, which may be used as a translation into the kernel:

[ e | True ]         = [ e ]
[ e | q ]            = [ e | q, True ]
[ e | b, Q ]         = if b then [ e | Q ] else []
[ e | p <- l, Q ]    = let ok p = [ e | Q ]
                           ok _ = []
                       in concatMap ok l
[ e | let decls, Q ] = let decls in [ e | Q ]

That is, the meaning of a list comprehension is given by translation into a simpler language with if-expressions, let-bindings, and calls to concatMap. We can figure out the meaning of your example by translating it through the following steps:

[x | x <- [1..], x <= 4]

-- apply rule 4 --
let ok x = [ x | x <= 4 ]
    ok _ = []
in concatMap ok [1..]

-- eliminate unreachable clause in ok --
let ok x = [ x | x <= 4 ]
in concatMap ok [1..]

-- apply rule 2 --
let ok x = [ x | x <= 4, True ]
in concatMap ok [1..]

-- apply rule 3 --
let ok x = if x <= 4 then [ x | True ] else []
in concatMap ok [1..]

-- apply rule 1 --
let ok x = if x <= 4 then [ x ] else []
in concatMap ok [1..]

-- inline ok --
concatMap (\x -> if x <= 4 then [ x ] else []) [1..]

Why can't GHC reason about some infinite lists?

2 Answers2