Why does this list comprehension fail?

Question

I'm trying to select unique elements from a list like this:

x = [1,1,2,3,4]
s = [e | e <- x, not (e `elem` s)]

It doesn't produce errors, but when I try to read from s it seems like the program hangs. Why?

Plus, what's the right way to do this?

Does the program crash, or does it hang? I can imagine reasons why it wouldn't terminate, but not why it would crash (unless it were some kind of out of memory/stack space error). — Joshua Taylor, Sep 10 '13 at 20:49
@JoshuaTaylor I guess it hangs (I updated the text of the original question). It stops accepting input (in the console - I'm using emacs) and doesn't produce any output when I read from `s`. — Ganymede, Sep 10 '13 at 20:51
your code is equivalent to ``s = filter (`notElem`s) x`` (which clearly causes an unbounded recursion). -- or as closed expression, it is ``mks [] s = s; mks (y:ys) s = [y | y`notElem`s] ++ mks ys s`` so that `s = mks x s` (which is suggestive of the fix ``mks [] s = []; mks (y:ys) s = let a=[y | y`notElem`s]; s2=a++s in a++mks ys s2`` with `s = mks x []`). — Will Ness, Sep 11 '13 at 17:08

Joshua Taylor · Accepted Answer · 2013-09-12T12:30:44.680

17

I'm not much of a Haskell-er, but this seems like you've just coded up something sort of like¹ Russell's paradox. Aren't you asking for a list s whose elements are those that are in x, but not in s?

s = [e | e <- [1,1,2,3,4], not (e `elem` s)]

So, consider what happens when you try to ask for the first element of s. Well, the first element from e is 1, so 1 will be the first element of s if not (1 `elem` s). Well, is (1 `elem` s)? We can check by iterating over the elements of s and seeing if 1 appeared. Well, let's start with the first element of s…

In general suppose that some n is an element of s. Then what must be true? n must be an element of x (easy to check), and also not an element of s. But we supposed that it was an element of s. This is a contradiction. Therefore, no n can be an element of s, so s must be the empty list. Unfortunately, the Haskell compiler isn't doing the proof that we just did, it's trying to programmatically compute the elements of s.

To remove duplicate items from a list, you want the function that Neil Brown recommended in a comment, nub from Data.List:

nub::Eqa => [a] -> [a] Source

O(n^2). The nub function removes duplicate elements from a list. In particular, it keeps only the first occurrence of each element. (The name nub means ‘essence’.) It is a special case of nubBy, which allows the programmer to supply their own equality test.

It's not actually Russell's paradox; Russell's paradox is about a set that contains only those sets that don't contain themselves. That set can't exist, because if it contains itself, then it must not contain itself, and if it does not contain itself, then it must contain itself.

edited Sep 12 '13 at 12:30

answered Sep 10 '13 at 20:51

Joshua Taylor

84,998
9
154
353

Thanks, I just read up on Russell's paradox - valuable info. I see why it doesn't work now. Now I just have to get it to work... – Ganymede Sep 10 '13 at 20:57
1

The list `s` is defined once. So if you ask if an element is in `s` it has to evaluate the list (at least until it finds that element). I suspect you want a recursive function to do this, e.g. how `nub` is defined. This post may help you: http://buffered.io/posts/a-better-nub – Neil Brown Sep 10 '13 at 21:00
Well, what would you consider a solution that “works”? I've updated my answer, explaining why `s` could only be the empty list here, but I'm not sure what sort of results you're expecting from your program. – Joshua Taylor Sep 10 '13 at 21:00
@NeilBrown Thanks, but `nub [1,1,2,3,4]` leads to the error `Not in scope: 'nub'`. @JoshuaTaylor By "work" I mean successfully selecting unique elements from the list `x` – Ganymede Sep 10 '13 at 21:03
@Ganymede Ah, sorry, I missed the bit about selecting unique elements from a list. The [nub](http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-List.html#v%3Anub) that @NeilBrown linked to _is_ part of the standard library. The author of the blog post defined a more efficient version for a more restricted case, but the general `nub` is built-in. – Joshua Taylor Sep 10 '13 at 21:05
3

@Ganymede `import Data.List` – Sassa NF Sep 10 '13 at 21:22

J. Abrahamson · Answer 2 · 2013-09-12T18:54:34.093

8

Note that while Russel's Paradox helps to suggest that this might be non-computable, it still fails even if you change it to s = [e | e <- x, elem e s].

Here's an instructive manual expansion. For any non-empty list, x

s = [e | e <- x, not (e `elem` s)]

simplifies to

s = do e <- x
       guard (not (e `elem` s))
       return e

s = x >>= \e -> if (not (e `elem` s)) then return e else mzero

s = concatMap (\e -> if (not (e `elem` s)) then [e] else []) x

s = foldr ((++) . (\e -> if (not (e `elem` s)) then [e] else [])) [] x

s = foldr (\e xs -> if (not (e `elem` s)) then (e:xs) else xs) [] x

s = foldr (\e ys -> if (e `elem` s) then ys else (e:ys)) [] x

which we can then begin evaluating. Since x was non-empty we can replace it with x:xs and inline a foldr

let f = (\e ys -> if (e `elem` s) then ys else (e:ys))

s = f x (foldr f [] xs)

s = (\ys -> if (x `elem` s) then ys else (x:ys)) (foldr f [] xs)

s = (\ys -> if (x `elem` f x (foldr f [] xs)) then ys else (x:ys)) (foldr f [] xs)

which is where we have our infinite loop—in order to evaluate f x (foldr f [] xs) we must evaluate f x (foldr f [] xs). You might say that the definition of s is not "productive enough" to kickstart its self-recursion. Compare this to the trick fibs definition

fibs = 1:1:zipWith (+) fibs (tail fibs)

which is kick-started with 1:1:... in order to be "productive enough". In the case of s, however, there's no (simple) way to be productive enough (see Will Ness' comment below for a fiendish workaround).

If we don't have the not there, it just switches the order of the branches on the if, which we never reach anyway.

edited Sep 12 '13 at 18:54

answered Sep 10 '13 at 21:29

J. Abrahamson

72,246
9
135
180

2

with some contortions [filtering can be made to work](https://gist.github.com/WillNess/6526878#file-nub_as_filter-hs) after all, by restricting `notElem`'s appetite. :) – Will Ness Sep 11 '13 at 17:50
Hah! I started walking down that path a bit too, but didn't go far enough to think of *that*. That's some great knot tying. – J. Abrahamson Sep 11 '13 at 18:22
yeah, I'm wondering, is there _anything_ `concat+map` couldn't do? (cf. [e.g. this](http://stackoverflow.com/a/11951590/849891)). – Will Ness Sep 12 '13 at 05:55
There are probably some things that LogicT can do that concat+map cannot. I'm 90% certain you can do it in pure denotation as well---it's not just that LogicT is more efficient, but it can actually represent more computations. The linger 10% is the thought that you can still "represent" them with the raw list monad... they're just awkward. – J. Abrahamson Sep 12 '13 at 18:55
turns out [the breaking up of `filter`](https://gist.github.com/WillNess/6526878#file-nub_as_filter-hs) into `map`ping list's elements to possible singletons followed by a `concat` to eliminate empty lists is very similar to what is done in ["Stream Fusion. From Lists to Strings to Nothing at All" paper](http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.7401) by introducing `Skip` in addition to `Done` and `Yield` stream generator cases. – Will Ness Sep 23 '13 at 08:03

score 1 · Answer 3 · answered Sep 10 '13 at 22:20

s = [e | (e:t) <- tails x, not (e `elem` t)]

The above is not meant to be the most efficient solution, but demonstrating how you could reason about the solution: in order to include the element of x only once, we need to make sure it is the last such element in x. This means we can search for occurrence of the element in the tail of the list. Data.List.tails produces all sublists of the list, so we can include the head of a sublist, if it doesn't appear in the remainder of the sublist - this is the condition that the head of the sublist is the last such element in the original list.

Referencing the value you are defining can cause unterminating computation, if the function using the value is strict (eager). The function is strict, if it always needs the complete value of the argument in order to produce a result.

For example, length is strict in the number of elements of the list - but not necessarily the actual elements of the list. So length [[i..] | i <- [1..10]] terminates without computing the values of the elements in the list (the infinite lists. Yet, length [[i..] | i <- [1..]] does not terminate, because in order to return a result, it needs to compute existence of all elements, which can never end for a open range.

However,

gtlength :: Int -> [a] -> Ordering
gtlength n [] = n `compare` 0
gtlength 0 xs = GT
gtlength n xs = gtlength (n-1) $ tail xs

can terminate even for infinite lists, because it doesn't need to evaluate the entire list.

Your function hangs because elem is strict. In order to test for non-existence of a element, it needs to evaluate the entire list, which is not available.

Why does this list comprehension fail?

3 Answers3