Update: See below -- the poor performance of lengthOf
seems to be just lack of specialization to the list case when benchmarking.
As @WillemVanOnsem indicates, I think the comment is mostly referring to the fact that this particular approach -- of running through the elements of the container with a counter -- will be inefficient for containers that have some other method of returning a length. For example, for a very large vector v
, you could technically use lengthOf traverse v
, but Data.Vector.length v
will be much faster.
On the other hand, lengthOf
can be quite inefficient for, say, counting the elements in a list. The following benchmark:
import Criterion.Main
import Control.Lens
l :: [Int]
l = replicate 1000000 123
main = defaultMain
[ bench "Prelude.length" $ whnf length l
, bench "Control.Lens.lengthOf" $ whnf (lengthOf traverse) l
]
shows that length
is about 15 times faster than lengthOf
. (I used GHC 8.4.3 with -O2
for all my tests.)
Note that this difference isn't a result of list fusion (since there's no fusion in the Prelude.length
case when the whnf
call is used).
It's actually a result of specialization of the code to lists. Even though Prelude.length
is applicable to any Foldable
, the instance for lists uses a list-specific implementation that's essentially equivalent to:
myLength :: [a] -> Int
myLength xs = lenAcc xs 0
where lenAcc [] n = n
lenAcc (_:ys) n = lenAcc ys (n+1)
(I didn't check for sure that this was the implementation being used, but myLength
had nearly equivalent performance to Data.List
.)
The Core for myLength
uses unboxed integers in a loop that directly pattern matches the list constructors, more or less like:
lenAcc
= \xs n ->
case xs of
[] -> n
(:) _ xs' -> lenAcc xs' (+# n 1#)
It turned out that if I used lengthOf
in a more realistic program with ample room to specialize to a list in the same way:
import Control.Lens
l :: [Int]
{-# NOINLINE l #-}
l = replicate 1000000 123
myLength :: [a] -> Int
myLength = lengthOf traverse
main = print (myLength l)
it generated Core like the following. The same as above with an extra parameter which was essentially a casting identity function:
lenAcc'
lenAcc'
= \n id' xs ->
case xs of {
[] -> id' (I# n);
(:) _ xs' -> lenAcc' (+# n 1#) id' xs'
}
I wasn't able to benchmark it, but it would probably be plenty fast.
So, lengthOf traverse
is capable of being optimized to be almost as fast as Prelude.length
, but depending on how it's used, it might end up using a really inefficient implementation.