8

For example:

intersectBy : (a -> a -> Bool) -> List a -> List a -> List a
intersectBy _  [] _     =  []
intersectBy _  _  []    =  []
intersectBy eq xs ys    =  [x | x <- xs, any (eq x) ys]

There are extra patterns for [] and seems like they are used in Haskell Data.List but what kind of optimization is that? And where is difference with Idris here?

I ask because I heard that "it will make reasoning about it more difficult" and person who said me that had no time to fully explain it.

I doubt if I can understand it doing "reduce the proof" of function.

May someone explain me the politics of extra patterns here from positions of Haskell and Idris so I will be able to understand and see the difference.

cnd
  • 32,616
  • 62
  • 183
  • 313

4 Answers4

13

Semantically speaking, the pattern

intersectBy _  [] _     =  []

looks redundant, even from a performance point of view. Instead, in Haskell

intersectBy _  _  []    =  []

is not redundant since otherwise

intersectBy (==) [0..] []

would diverge, since the comprehension would attempt to try all the elements x <- [0..].

I am not sure I like this special case, though. Why shouldn't we add a special case covering intersectBy (==) [0..] [2] so that it returns [2]? Further, if performance is the concern, in many cases I'd like to use a O(n log n) approach through pre-sorting, even is this does not work on infinite lists and requires Ord a.

chi
  • 111,837
  • 3
  • 133
  • 218
  • Nice, It makes sense now! But how would you actually add a special case covering `intersect (==) [0..] [2]` ? – Sibi Apr 22 '15 at 09:51
  • why does `_ [] _` look redundant? – cnd Apr 22 '15 at 09:51
  • 1
    @Heather because `[x | x <- xs , ...]` evaluates to `[]` immediately when `xs` is empty. – chi Apr 22 '15 at 09:53
  • @chi but empty cases is using just everywhere in GHC prelude even in this function (at least used) https://github.com/ghc/ghc/blob/c5977c2e2951e9e346a8f4990d5a6bbdbf9cee0b/libraries/base/Data/OldList.hs#L436-L440 – cnd Apr 22 '15 at 09:54
  • @Sibi For instance `intersectBy xs [y] = [ y | elem y xs]`. I would not really add that in real code -- it's just for the sake of argument. – chi Apr 22 '15 at 09:54
  • 1
    @chi That's not a valid implemenation for `intersectBy`. Also, what if something like this is the input: `intersect (==) [2] [0..]`. ? I think that's why that special case isn't considered. – Sibi Apr 22 '15 at 09:58
  • @Heather In other cases it is often necessary. Here it does not seem to be the case. Maybe the authors of that version of Prelude felt the code followed a better style in this way -- I have no idea. – chi Apr 22 '15 at 09:59
  • 1
    @Sibi I forgot the "by". Use instead `intersectBy eq xs [y] = [ y | any (eq y) xs]`. The symmetric case is already handled by the last general comprehension. Still, I feel these special cases make the semantics more complex, and are harder to understand. If one list is infinite and the other contains some element not in the former list, the intersection has to diverge anyway. Making special cases for null, singleton, or even finite lists feels tricky to me. – chi Apr 22 '15 at 10:03
  • @chi Thanks for explaining. I think the prelude authors might have added the special cases just for the empty list in the presence of infinite lists. But that's just a guess.... :) – Sibi Apr 22 '15 at 10:14
  • I think you could argue that a correct intersection would be sure to terminate if *either* list is finite. This will not, however, be very pretty, especially given the `By` and no ordering. – dfeuer Apr 22 '15 at 20:03
11

There's no need to guess when you can look up the history through git blame, GHC Trac and the libraries mailing list.

Originally the definition was just the third equation,

intersectBy eq xs ys    =  [x | x <- xs, any (eq x) ys]

In https://github.com/ghc/ghc/commit/8e8128252ee5d91a73526479f01ace8ba2537667 the second equation was added as a strictness/performance improvement, and at the same time, the first equation was added so as to make the new definition always at least as defined as the original. Otherwise, intersectBy f [] _|_ would be _|_ when it was [] before.

It seems to me that this current definition is now maximally lazy: it is as defined as possible for any inputs, except that one has to choose whether to check the left or right list for emptiness first. (And, as I mentioned above, this choice is made to be consistent with the historical definition.)

Reid Barton
  • 14,951
  • 3
  • 39
  • 49
  • Wouldn't adding `intersectBy eq xs [y] = [y | any (eq y) xs]` make it even more defined, e.g. `intersectBy (==) [0..] [2]` ? – chi Apr 22 '15 at 22:38
  • 2
    @chi, yes, I believe so, and I believe it should be possible to take that trick all the way, ensuring a finite result if either argument is finite. But the current implementation does not remove duplicates and does not change ordering, whereas yours removes duplicates and my hypothetical one would remove duplicates and reorder. – dfeuer Apr 22 '15 at 22:55
  • @dfeuer Good point about the duplicates -- one might argue that my addition changes the semantics of the original. About taking the trick all the way: it might be possible but it would require some non trivial "fair scheduling" since you don't know in advance which of these lists is the finite one. Also, you need to assume that the finite one is a subset of the other, since you can't check whether something is not an element of an infinite list. – chi Apr 22 '15 at 23:07
  • @chi, good point about the semidecidability. I forgot about that! – dfeuer Apr 22 '15 at 23:32
  • so it is optimization if I understand right but noone yet explained me the possible problems with reasoning sadly, for now it's added to idris without emty patterns. – cnd Apr 23 '15 at 06:25
5

@chi explains the _ _ [] case, but _ [] _ serves a purpose as well: it dictates how intersectBy handles bottom. With the definition as written:

λ. intersectBy undefined    []     undefined
[]
λ. intersectBy   (==)    undefined    []
*** Exception: Prelude.undefined

Remove the first pattern and it becomes:

λ. intersectBy undefined undefined    []
[]
λ. intersectBy   (==)       []     undefined
*** Exception: Prelude.undefined

I'm not 100% certain of this, but I believe there's also a performance benefit to not binding anything in the first pattern. The final pattern will give the same result for xs == [] without evaluating eq or ys, but AFAIK it still allocates stack space for their thunks.

Community
  • 1
  • 1
  • 2
    Not likely, no. The compiler will erase unused bindings. It's true that in some (relatively unusual) circumstances, an unused argument will be created as a thunk, but that will happen regardless of whether it's bound. – dfeuer Apr 22 '15 at 22:07
  • Interesting. So the general advice to not bind unless you're using it is just for good style, not any technical consideration? – Theodore Lief Gannon Apr 22 '15 at 22:10
  • 2
    Yes. It makes it clear to readers that it's not used. There's a compiler warning available for unused bindings (enabled by `-Wall` or by `-fwarn-unused-binds` or some such). It prevents you from accidentally forgetting to use something that you intended to, where that intent is signalled by binding a name that does not begin with an underscore. – dfeuer Apr 22 '15 at 22:13
  • 1
    Good style also recommends that arguments that are *sometimes* used be bound, where unused, to names that begin with underscores. `f 0 _x = 3; f _ x = x`. This is also likely an effective way to confuse Agda programmers, which is always a good goal. – dfeuer Apr 22 '15 at 22:15
4

There is a big difference in Idris: Idris lists are always finite! Furthermore, Idris is a mostly strict (call-by-value) language, and optionally uses a totality checker, so it's pretty reasonable to assume there won't be any bottoms hiding in the argument lists. The significance of that difference is that the two definitions are much more nearly semantically identical in Idris than in Haskell. The choice of which to use may be made based on the ease of proving properties of the function, or may be based on simplicity.

dfeuer
  • 48,079
  • 5
  • 63
  • 167