Could pattern-matching in do-notation/enumFromTo slow down a Haskell code?

Question

I've been solving a pretty easy problem: generation of all decreasing sequences in length of L, consisting of natural numbers from 1 up to M in lexicographical order. Yet, I ran into a quite strange issue. Take a look:

c :: (Ord a, Num a, Enum a) => a -> a -> [[a]]
c m 1 = map return [1..m]
c m l = do
          n   <- [l..m]
          res <- c (n - 1) (l - 1)
          return $ n:res

c' :: (Ord a, Num a, Enum a) => a -> a -> [[a]]
c' m = helper 1 m where
 helper :: (Ord a, Num a, Enum a) => a -> a -> a -> [[a]]
 helper a b 1 = map return [a..b]
 helper a b l = do
                  n    <- [a..b]
                  True <- return $ (l - 1 <= n)
                  res  <- helper a (n - 1) (l - 1)
                  return (n:res)

So, obviously, those 2 functions do absolutely the same thing (I checked them on a number of tests, they both give correct results on each), but if you try to evaluate c 100 98 and c' 100 98 in GHCi, you will see an enormous difference in time it takes:

c 100 98: around 5 seconds;

c' 100 98: around 70 seconds;

As I've mentioned, the result is the same.

So, I kind of feel uneasy about generating [a..b] every time, yet I did a small bit of asking around, and there was a suggestion that Haskell doesn't pattern-match right off the bat, but delays it due to lazy-evaluations, which causes a formidable amount of extra calls of c'. However, the second theory didn't quite hold: I set a breakpoint in my code, directly from the GHCi command prompt, to monitor the value of n, which showed that delayed pattern-matching wasn't the case.

Could the problem be actually with the enumFromTo function, or is there any other reason?

1. Why `True <- return foo` instead of `guard foo`? 2. Why pass `a` around, when it never changes from its initial value of `1`? 3. Why `n <- [1..b]; guard (l-1 <= n)` instead of `n <- [max 1 (l-1)..n]`, or even `n <- [l-1 .. n]` if you aren't paranoid about such refactorings? — Daniel Wagner, Mar 04 '19 at 18:31
@DanielWagner I understand that the second snippet is written extremely poorly, it was the core that interested me. Yet, the mistake was tremendously awkward. Thanks for your help! — Zhiltsoff Igor, Mar 05 '19 at 18:31

score 1 · Answer 1 · answered Mar 03 '19 at 16:02

1

The two functions seem to have a completely different implementation:

c m l = do
      n   <- [l..m]
      res <- c (n - 1) (l - 1)
      return $ n:res

Here, at every recursive call, the parameter l gets decremented, while the parameter m becomes n <- [l--m].

By comparison,

helper a b l = do
    n    <- [a..b]
    True <- return $ (l - 1 <= n)
    res  <- helper a (n - 1) (l - 1)
    return (n:res)

Here the interval is [a..b] instead of [l..m] (why do you use different names, by the way? It's harder to compare the two snippets in that way.) So, we consider how parameters a and b change. Parameter a is unchanged, while b becomes n-1.

There's also a third argument l which was not present in the first snippet.

I fail to see how this would be the same algorithm. It looks completely different to me. You probably are causing more recursive calls here, which slow things down. Pattern matching is a red herring -- I think it's not that that's slowing things down, at least not directly.

Also, this part

    n    <- [a..b]
    True <- return $ (l - 1 <= n)

looks very suspicious. It should be something like

    n    <- [max a (l-1) .. b]

since the above will count from a to l-2 only to discard those choices in the next line. Generating choices only to discard them can slow your program down.

answered Mar 03 '19 at 16:02

chi

111,837
3
133
218

Why, hello. Sorry for the weird appearance of my code :"). The only difference in those two algorithms is that one changes both right and left boundaries, yet the second one changes the right boundary and filters unsuitable values of `n` via pattern-matching. You've said that I cause extra recursive calls, but that doesn't seem right since we filter out the same values every time, but we use different tools for that. Thus, I guess, my code was too bad to see that, but it doesn't seem to be the case. (I ran out of words, there's going to be the second half of this comment). – Zhiltsoff Igor Mar 03 '19 at 17:24
(Part 2: sorry for that). >Generating choices only to discard them can slow your program down. Yes, that's what I attempted to express when I was talking about the redundant use of `enumFromTo` function, but I was just caught off guard by +65 seconds. It just feels that there's some other issue that I'm not able to spot. Huge thanks for your time! – Zhiltsoff Igor Mar 03 '19 at 17:25
@ZhiltsoffIgor Perhaps you could try to find out how many values are actually "discarded". If they are many, they can make the code much slower. E.g. `[1..100]` is much faster than `filter (>0) [-1000000000..100]` – chi Mar 03 '19 at 18:46
Finding out the number of discarded values sounds like a nice exercise, yet not so simple, at least for me, since we can't really use both List and State Monad at the same time, obviously. There was a suggestion that there could be some strongarm method of counting via GHCi, but thereś still some research to be done. Do you know any more elegant way, perhaps? Anyways, I believe, we got our answer from @DanielWagner. I don't know how `l` was mixed up with `l - 1`, but it, eventually, was. Thanks for your time! – Zhiltsoff Igor Mar 05 '19 at 18:29
@ZhiltsoffIgor In this specific case, you could "print" `a` and `l-1` using `Debug.Trace.trace` to observe if they are very different. That might produce a lot of debug output, if there are many calls. – chi Mar 05 '19 at 18:38
I believe, 23527350 values. Around 10^7; seems about right. – Zhiltsoff Igor Mar 06 '19 at 18:38

score 1 · Accepted Answer · answered Mar 04 '19 at 18:26

Changing your True <- return $ (l - 1 <= n) to True <- return $ (l <= n), to match what the first snippet does, equalizes the timings of the two for me (without changing the answer).

Without this change, your second snippet wastes a lot of time trying to find decreasing sequences of length l among the numbers [1..l-1] (for many different values of l), a doomed task.

Could pattern-matching in do-notation/enumFromTo slow down a Haskell code?

2 Answers2