What are the space complexities of inits and tails?

Question

TL; DR

After reading the passage about persistence in Okasaki's Purely Functional Data Structures and going over his illustrative examples about singly linked lists (which is how Haskell's lists are implemented), I was left wondering about the space complexities of Data.List's inits and tails...

It seems to me that

the space complexity of tails is linear in the length of its argument, and
the space complexity of inits is quadratic in the length of its argument,

but a simple benchmark indicates otherwise.

Rationale

With tails, the original list can be shared. Computing tails xs simply consists in walking along list xs and creating a new pointer to each element of that list; no need to recreate part of xs in memory.

In contrast, because each element of inits xs "ends in a different way", there can be no such sharing, and all the possible prefixes of xs must be recreated from scratch in memory.

Benchmark

The simple benchmark below shows there isn't much of a difference in memory allocation between the two functions:

-- Main.hs

import Data.List (inits, tails)

main = do
    let intRange = [1 .. 10 ^ 4] :: [Int]
    print $ sum intRange
    print $ fInits intRange
    print $ fTails intRange

fInits :: [Int] -> Int
fInits = sum . map sum . inits

fTails :: [Int] -> Int
fTails = sum . map sum . tails

After compiling my Main.hs file with

ghc -prof -fprof-auto -O2 -rtsopts Main.hs

and running

./Main +RTS -p

the Main.prof file reports the following:

COST CENTRE MODULE  %time %alloc

fInits      Main     60.1   64.9
fTails      Main     39.9   35.0

The memory allocated for fInits and that allocated for fTails have the same order of magnitude... Hum...

What is going on?

Are my conclusions about the space complexities of tails (linear) and inits (quadratic) correct?
If so, why does GHC allocate roughly as much memory for fInits and fTails? Does list fusion have something to do with this?
Or is my benchmark flawed?

My only guess would be: The intermediate `Int`s aren't optimized away, so `fTails` also makes O(n^2) allocations for those. One would have to look at the core to check that (I don't have a ghc at hand). — , Apr 01 '15 at 14:31
You should probably force the list (`print $ sum intRange`) before running either `fInits` or `fTails`. — Cirdec, Apr 01 '15 at 14:36
@delnan Thanks. I'm not use to inspecting core, yet, but I'll look into it. — jub0bs, Apr 01 '15 at 14:39
@AndrásKovács `(ಠ_ರ) ?` Are you using the same benchmark as me? Which version of GHC are you using? I've got GHC 7.10.1. — jub0bs, Apr 01 '15 at 14:45
@Jubobs GHC 7.10.1 here, but I used `+RTS -s` instead of the profiler. — András Kovács, Apr 01 '15 at 14:46
With windows GHC 7.8.3 I have `100.0%` allocation in `fInits` and `91.7%` of time spent in `fInits` (`+RTS -p`, same compiler options). — Cirdec, Apr 01 '15 at 14:46
Ignore my GHC 7.8.3 results (and anyone elses). GHC [7.8.3 has a bug where inits is very slow](https://ghc.haskell.org/trac/ghc/ticket/9345). It was fixed in 7.8.4. — Cirdec, Apr 01 '15 at 14:50
@AndrásKovács Where do you see a per-function breakdown in the output of `./Main +RTS -s`? What do you get if you use `+RTS -p`? — jub0bs, Apr 01 '15 at 15:03
@Jubobs I just comment one or the other out when using `+RTS -s`. With `RTS -p` I got results similar to yours. — András Kovács, Apr 01 '15 at 15:08
@AndrásKovács Thanks for clarifying. Do you have an explanation? — jub0bs, Apr 01 '15 at 15:12
Nothing besides "the profiler is acting up". I looked at the Core and it's got nothing funny going on; it just calls `inits` and `tails` and sums. — András Kovács, Apr 01 '15 at 15:18
@AndrásKovács Thanks. I just wanted confirmation that "something is afoot" and that I'm not completely crazy. — jub0bs, Apr 01 '15 at 15:20

score 2 · Accepted Answer · edited Oct 08 '18 at 14:10

The implementation of inits in the Haskell Report, which is identical to or nearly identical to implementations used up to base 4.7.0.1 (GHC 7.8.3) is horribly slow. In particular, the fmap applications stack up recursively, so forcing successive elements of the result gets slower and slower.

inits [1,2,3,4] = [] : fmap (1:) (inits [2,3,4])
 = [] : fmap (1:) ([] : fmap (2:) (inits [3,4]))
 = [] : [1] : fmap (1:) (fmap (2:) ([] : fmap (3:) (inits [4])))
....

The simplest asymptotically optimal implementation, explored by Bertram Felgenhauer, is based on applying take with successively larger arguments:

inits xs = [] : go (1 :: Int) xs where
  go !l (_:ls) = take l xs : go (l+1) ls
  go _  []     = []

Felgenhauer was able to eke some extra performance out of this using a private, non-fusing version of take, but it was still not as fast as it could be.

The following very simple implementation is significantly faster in most cases:

inits = map reverse . scanl (flip (:)) []

In some weird corner cases (like map head . inits), this simple implementation is asymptotically non-optimal. I therefore wrote a version using the same technique, but based on Chris Okasaki's Banker's queues, that is both asymptotically optimal and nearly as fast. Joachim Breitner optimized it further, primarily by using a strict scanl' rather than the usual scanl, and this implementation got into GHC 7.8.4. inits can now produce the spine of the result in O(n) time; forcing the entire result requires O(n^2) time because none of the conses can be shared among the different initial segments. If you want really absurdly fast inits and tails, your best bet is to use Data.Sequence; Louis Wasserman's implementation is magical. Another possibility would be to use Data.Vector—it presumably uses slicing for such things.

Very comprehensive answer. Thanks. – jub0bs Apr 15 '15 at 14:03 — jub0bs, Apr 15 '15 at 14:03

What are the space complexities of inits and tails?

TL; DR

Rationale

Benchmark

What is going on?

1 Answers1

Linked