Intersection of infinite lists

Question

I know from computability theory that it is possible to take the intersection of two infinite lists, but I can't find a way to express it in Haskell.

The traditional method fails as soon as the second list is infinite, because you spend all your time checking it for a non-matching element in the first list.

Example:

let ones = 1 : ones -- an unending list of 1s
intersect [0,1] ones

This never yields 1, as it never stops checking ones for the element 0.

A successful method needs to ensure that each element of each list will be visited in finite time.

Probably, this will be by iterating through both lists, and spending approximately equal time checking all previously-visited elements in each list against each other.

If possible, I'd like to also have a way to ignore duplicates in the lists, as it is occasionally necessary, but this is not a requirement.

What result tells you you can take the intersection of two infinite lists? Pretty sure this is not true as such. — leftaroundabout, Feb 17 '17 at 14:29
@leftaroundabout for a somewhat similar, and perhaps simpler example, google the [countability of the rational numbers](https://www.google.com/search?q=countability+of+the+rational+numbers) or [this link](http://www.homeschoolmath.net/teaching/rational-numbers-countable.php) — Zoey Hewll, Feb 17 '17 at 14:56
@AJFarmar what do you mean 'this algorithm'? I didn't supply one — Zoey Hewll, Feb 18 '17 at 01:23
Questions should contain *the problem*, not the solution. If you feel you have came up with a different and/or better answer please *answer your own question* (it's fine to do!). If the question contains an answer users are unable to vote separately for the two (e.g. one might want to upvote a good answer to a "not-so-good question" without upvoting the question itself or viceversa). — Bakuriu, Feb 19 '17 at 19:49

Daniel Wagner · Answer 1 · 2021-10-05T17:26:53.103

Using the universe package's Cartesian product operator we can write this one-liner:

import Data.Universe.Helpers

isect :: Eq a => [a] -> [a] -> [a]
xs `isect` ys = [x | (x, y) <- xs +*+ ys, x == y]
-- or this, which may do marginally less allocation
xs `isect` ys = foldr ($) [] $ cartesianProduct 
    (\x y -> if x == y then (x:) else id)
    xs ys

Try it in ghci:

> take 10 $ [0,2..] `isect` [0,3..]
[0,6,12,18,24,30,36,42,48,54]

This implementation will not produce any duplicates if the input lists don't have any; but if they do, you can tack on your favorite dup-remover either before or after calling isect. For example, with nub, you might write

> nub ([0,1] `isect` repeat 1)
[1

and then heat up your computer pretty good, since it can never be sure there might not be a 0 in that second list somewhere if it looks deep enough.

This approach is significantly faster than David Fletcher's, produces many fewer duplicates and produces new values much more quickly than Willem Van Onsem's, and doesn't assume the lists are sorted like freestyle's (but is consequently much slower on such lists than freestyle's).

Willem Van Onsem · Answer 2 · 2021-10-05T13:28:30.310

6

An idea might be to use incrementing bounds. Let is first relax the problem a bit: yielding duplicated values is allowed. In that case you could use:

import Data.List (intersect)

intersectInfinite :: Eq a => [a] -> [a] -> [a]
intersectInfinite = intersectInfinite' 1
    where intersectInfinite' n = intersect (take n xs) (take n ys) ++ intersectInfinite' (n+1)

In other words we claim that:

A∩B = A₁∩B₁ ∪ A₂∩B₂ ∪ ... ∪ ...

with A₁ is a set containing the first i elements of A (yes there is no order in a set, but let's say there is somehow an order). If the set contains less elements then the full set is returned.

If c is in A (at index i) and in B (at index j), c will be emitted in segment (not index) max(i,j).

This will thus always generate an infinite list (with an infinite amount of duplicates) regardless whether the given lists are finite or not. The only exception is when you give it an empty list, in which case it will take forever. Nevertheless we here ensured that every element in the intersection will be emitted at least once.

Making the result finite (if the given lists are finite)

Now we can make our definition better. First we make a more advanced version of take, takeFinite (let's first give a straight-forward, but not very efficient defintion):

takeFinite :: Int -> [a] -> (Bool,[a])
takeFinite _ [] = (True,[])
takeFinite 0 _  = (False,[])
takeFinite n (x:xs) = let (b,t) = takeFinite (n-1) xs in (b,x:t)

Now we can iteratively deepen until both lists have reached the end:

intersectInfinite :: Eq a => [a] -> [a] -> [a]
intersectInfinite = intersectInfinite' 1

intersectInfinite' :: Eq a => Int -> [a] -> [a] -> [a]
intersectInfinite' n xs ys | fa && fb = intersect xs ys
                           | fa = intersect ys xs
                           | fb = intersect xs ys
                           | otherwise = intersect xfa xfb ++ intersectInfinite' (n+1) xs ys
    where (fa,xfa) = takeFinite n xs
          (fb,xfb) = takeFinite n ys

This will now terminate given both lists are finite, but still produces a lot of duplicates. There are definitely ways to resolve this issue more.

edited Oct 05 '21 at 13:28

answered Feb 17 '17 at 14:37

Willem Van Onsem

443,496
30
428
555

2

Correction: the list would either be infinite or empty. If there is any intersection, it'll be infinite, but if there are no shared elements it'll be empty (of course, if it's empty it will just hang, but that's guaranteed for any method) – Zoey Hewll Feb 17 '17 at 14:46
1

@ZoeyHewll: yes thank you. I think I resolved the problem a bit more, but the solution can still be improved. – Willem Van Onsem Feb 17 '17 at 14:55
1

@ZoeyHewll Calling such a list 'empty' is not really correct. `[]` is the empty list. A program which runs forever, and after running forever produces a value (any value - which happens to be here the empty list) does not represent that value, it represents bottom. – user2407038 Feb 17 '17 at 18:51
@user2407038 I don't really know what bottom is, but I was going from a math point of view that even if the computer could fully compute the intersection in finite time, it would still only return the empty list. Kinda like how the lists are never actually infinite, just unbounded and finite, but it's convenient to call them infinite. – Zoey Hewll Feb 18 '17 at 00:32
@ZoeyHewll But Haskell lists *can* actually be infinite... it isn't merely a 'convenience' to reason about them as such. Any lazy recursive data structure can contain infinite values, which is almost the entire point of laziness. Precisely from a math point a view, a computation which takes forever to 'return' a value is bottom (`undefined`) because there is absolutely no way to *distinguish* such a computation (in the theory in which you have formalized 'computation') from one which loops forever doing nothing. (This is commonly known as the 'halting problem') – user2407038 Feb 18 '17 at 06:39
@user2407038 I know of the halting problem. When I say "a math point of view" I mean like set theory, not like computation theory. Though it is impossible to compute the intersection of two infinite lists in finite time, mathematically speaking you can state whether their set intersection is empty if you know they are necessarily mutually exclusive. – Zoey Hewll Feb 20 '17 at 03:04
@user2407038 However I understand your definition of bottom, and it makes sense what you're saying about this computation. I'm not sure how you can state that a computed data structure can be _actually_ infinite, as opposed to simply unbounded, but I don't know much about how laziness works and how you would reason about it theoretically – Zoey Hewll Feb 20 '17 at 03:10

score 5 · Answer 3 · answered Feb 17 '17 at 14:57

5

Here's one way. For each x we make a list of maybes which has Just x only where x appeared in ys. Then we interleave all these lists.

isect :: Eq a => [a] -> [a] -> [a]
isect xs ys = (catMaybes . foldr interleave [] . map matches) xs
  where
    matches x = [if x == y then Just x else Nothing | y <- ys]

interleave :: [a] -> [a] -> [a]
interleave [] ys = ys
interleave (x:xs) ys = x : interleave ys xs

Maybe it can be improved using some sort of fairer interleaving - it's already pretty slow on the example below because (I think) it's doing an exponential amount of work.

> take 10 (isect [0..] [0,2..])
[0,2,4,6,8,10,12,14,16,18]

answered Feb 17 '17 at 14:57

David Fletcher

2,590
1
12
14

Ah, so you still iterate through one list and check the other for matches, but you've interleaved the results... – Zoey Hewll Feb 17 '17 at 15:13
I can't quite figure how to improve the interleaving, mainly because I can't figure out how it's done currently. I can trace the code, but I cant figure out what the result of that `foldr` will be on the order of the elements – Zoey Hewll Feb 17 '17 at 15:13
1

It favours earlier lists, moving through each one twice as fast as the next - like how in `interleave ["a1", "a2", "a3", "a4", "a5", a6"] (interleave ["b1", "b2", "b3"] ["c1", "c2", "c3"])` every second element is an `a`. – David Fletcher Feb 17 '17 at 15:40
2

You might like to use [`diagonal`](http://hackage.haskell.org/package/universe-base-1.0.2.1/docs/Data-Universe-Helpers.html#v:diagonal) for a more fair interleaving. – Daniel Wagner Feb 17 '17 at 16:26
@DanielWagner Ah, `diagonal` is exactly the interleaving method I was thinking would be ideal! – Zoey Hewll Feb 18 '17 at 01:29
@ZoeyHewll Then why did you accept this answer? It doesn't even use `diagonal`, whereas mine does... – Daniel Wagner Feb 18 '17 at 02:19
@DanielWagner where does yours? Also, in this one, I simply replace the `foldr interleave []` with `diagonal` to produce an easily (for me) readable, expressive function. While yours is more concise, I find it takes more work for me to interpret. – Zoey Hewll Feb 18 '17 at 02:32
@ZoeyHewll Right [here](http://hackage.haskell.org/package/universe-base-1.0.2.1/docs/src/Data-Universe-Helpers.html#%2B%2A%2B): `xs +*+ ys = diagonal [[(x, y) | x <- xs] | y <- ys]` – Daniel Wagner Feb 18 '17 at 02:36
@ZoeyHewll And by the way, the comment on the other defining line for `+*+` points out that this implementation is buggy: compare `isect [0..] []` between my implementation and his. This is why library reuse is so important: you get exposure to all the bugfixes anybody has ever done without all the work of thinking of them yourself. – Daniel Wagner Feb 18 '17 at 02:42
@DanielWagner Fair. I hadn't seen the implementation of `+*+`. However, I stick by my accepted answer as [the one I found most helpful](http://stackoverflow.com/help/accepted-answer) – Zoey Hewll Feb 18 '17 at 02:44
@DavidFletcher you're right about your answer being exponential. In `foldr interleave [] $ repeat <$> [1..]`, every second element is `1`, every 4th element is `2`, every `2^n`th element is `n`. This means that in the calculation of the intersection, each element of the first list is tested half as often (against elements of the other list) as the element before it. – Zoey Hewll Feb 18 '17 at 03:21

score 4 · Answer 4 · answered Feb 17 '17 at 15:57

4

If elements in the lists are ordered then you can easy to do that.

intersectOrd :: Ord a => [a] -> [a] -> [a]
intersectOrd [] _ = []
intersectOrd _ [] = []
intersectOrd (x:xs) (y:ys) = case x `compare` y of
    EQ -> x : intersectOrd xs ys
    LT -> intersectOrd xs (y:ys)
    GT -> intersectOrd (x:xs) ys

answered Feb 17 '17 at 15:57

freestyle

3,692
11
21

See also [isect](http://hackage.haskell.org/package/data-ordlist-0.4.7.0/docs/Data-List-Ordered.html) from the data-ordlist package; that package provides many other handy operations for working on ordered (and potentially infinite) lists, too. – Daniel Wagner Feb 17 '17 at 15:59

chi · Answer 5 · 2017-02-17T22:56:36.363

Here's yet another alternative, leveraging Control.Monad.WeightedSearch

import Control.Monad (guard)
import Control.Applicative
import qualified Control.Monad.WeightedSearch as W

We first define a cost for digging inside the list. Accessing the tail costs 1 unit more. This will ensure a fair scheduling among the two infinite lists.

eachW :: [a] -> W.T Int a
eachW = foldr (\x w -> pure x <|> W.weight 1 w) empty

Then, we simply disregard infinite lists.

intersection :: [Int] -> [Int] -> [Int]
intersection xs ys = W.toList $ do
   x <- eachW xs
   y <- eachW ys
   guard (x==y)
   return y

Even better with MonadComprehensions on:

intersection2 :: [Int] -> [Int] -> [Int]
intersection2 xs ys = W.toList [ y | x <- eachW xs, y <- eachW ys, x==y ]

score 0 · Accepted Answer · answered Feb 20 '17 at 05:10

Solution

I ended up using the following implementation; a slight modification of the answer by David Fletcher:

isect :: Eq a => [a] -> [a] -> [a]
isect [] = const [] -- don't bother testing against an empty list
isect xs = catMaybes . diagonal . map matches
    where matches y = [if x == y then Just x else Nothing | x <- xs]

This can be augmented with nub to filter out duplicates:

isectUniq :: Eq a => [a] -> [a] -> [a]
isectUniq xs = nub . isect xs

Explanation

Of the line isect xs = catMaybes . diagonal . map matches

(map matches) ys computes a list of lists of comparisons between elements of xs and ys, where the list indices specify the indices in ys and xs respectively: i.e (map matches) ys !! 3 !! 0 would represent the comparison of ys !! 3 with xs !! 0, which would be Nothing if those values differ. If those values are the same, it would be Just that value.

diagonals takes a list of lists and returns a list of lists where the nth output list contains an element each from the first n lists. Another way to conceptualise it is that (diagonals . map matches) ys !! n contains comparisons between elements whose indices in xs and ys sum to n.
diagonal is simply a flat version of diagonals (diagonal = concat diagonals)

Therefore (diagonal . map matches) ys is a list of comparisons between elements of xs and ys, where the elements are approximately sorted by the sum of the indices of the elements of ys and xs being compared; this means that early elements are compared to later elements with the same priority as middle elements being compared to each other.

(catMaybes . diagonal . map matches) ys is a list of only the elements which are in both lists, where the elements are approximately sorted by the sum of the indices of the two elements being compared.

Note
(diagonal . map (catMaybes . matches)) ys does not work: catMaybes . matches only yields when it finds a match, instead of also yielding Nothing on no match, so the interleaving does nothing to distribute the work.

To contrast, in the chosen solution, the interleaving of Nothing and Just values by diagonal means that the program divides its attention between 'searching' for multiple different elements, not waiting for one to succeed; whereas if the Nothing values are removed before interleaving, the program may spend too much time waiting for a fruitless 'search' for a given element to succeed.

Therefore, we would encounter the same problem as in the original question: while one element does not match any elements in the other list, the program will hang; whereas the chosen solution will only hang while no matches are found for any elements in either list.

Intersection of infinite lists

6 Answers6

Solution

Explanation