0

I noticed this remark in lens claiming that lengthOf can be "rather inefficient":

-- /Note:/ This can be rather inefficient for large containers [...]
lengthOf :: Getting (Endo (Endo Int)) s a -> s -> Int
lengthOf l = foldlOf' l (\a _ -> a + 1) 0

Can it be asymptocically inefficient? Can it leak space despite being strict left fold or does GHC generate too much "busywork" code?

sevo
  • 4,559
  • 1
  • 15
  • 31

1 Answers1

2

Update: See below -- the poor performance of lengthOf seems to be just lack of specialization to the list case when benchmarking.

As @WillemVanOnsem indicates, I think the comment is mostly referring to the fact that this particular approach -- of running through the elements of the container with a counter -- will be inefficient for containers that have some other method of returning a length. For example, for a very large vector v, you could technically use lengthOf traverse v, but Data.Vector.length v will be much faster.

On the other hand, lengthOf can be quite inefficient for, say, counting the elements in a list. The following benchmark:

import Criterion.Main
import Control.Lens

l :: [Int]
l = replicate 1000000 123

main = defaultMain
  [ bench "Prelude.length"        $ whnf length l
  , bench "Control.Lens.lengthOf" $ whnf (lengthOf traverse) l
  ]

shows that length is about 15 times faster than lengthOf. (I used GHC 8.4.3 with -O2 for all my tests.)

Note that this difference isn't a result of list fusion (since there's no fusion in the Prelude.length case when the whnf call is used).

It's actually a result of specialization of the code to lists. Even though Prelude.length is applicable to any Foldable, the instance for lists uses a list-specific implementation that's essentially equivalent to:

myLength :: [a] -> Int
myLength xs = lenAcc xs 0
  where lenAcc [] n = n
        lenAcc (_:ys) n = lenAcc ys (n+1)

(I didn't check for sure that this was the implementation being used, but myLength had nearly equivalent performance to Data.List.)

The Core for myLength uses unboxed integers in a loop that directly pattern matches the list constructors, more or less like:

lenAcc
  = \xs n ->
      case xs of
        [] -> n
        (:) _ xs' -> lenAcc xs' (+# n 1#)

It turned out that if I used lengthOf in a more realistic program with ample room to specialize to a list in the same way:

import Control.Lens

l :: [Int]
{-# NOINLINE l #-}
l = replicate 1000000 123

myLength :: [a] -> Int
myLength = lengthOf traverse

main = print (myLength l)

it generated Core like the following. The same as above with an extra parameter which was essentially a casting identity function:

lenAcc'
lenAcc'
  = \n id' xs ->
      case xs of {
        [] -> id' (I# n);
        (:) _ xs' -> lenAcc' (+# n 1#) id' xs'
      }

I wasn't able to benchmark it, but it would probably be plenty fast.

So, lengthOf traverse is capable of being optimized to be almost as fast as Prelude.length, but depending on how it's used, it might end up using a really inefficient implementation.

K. A. Buhr
  • 45,621
  • 3
  • 45
  • 71