Idiomatic efficient Haskell append?

Question

List and the cons operator (:) are very common in Haskell. Cons is our friend. But sometimes I want to add to the end of a list instead.

xs `append` x = xs ++ [x]

This, sadly, is not an efficient way to implement it.

I wrote up Pascal's triangle in Haskell, but I had to use the ++ [x] anti-idiom:

ptri = [1] : mkptri ptri
mkptri (row:rows) = newRow : mkptri rows
    where newRow = zipWith (+) row (0:row) ++ [1]

imho, this is a lovely readable Pascal's triangle and all, but the anti-idiom irks me. Can someone explain to me (and, ideally, point me to a good tutorial) on what the idiomatic data structure is for cases where you want to append to the end efficiently? I'm hoping for near-list-like beauty in this data structure and its methods. Or, alternately, explain to me why this anti-idiom is actually not that bad for this case (if you believe such to be the case).

[edit] The answer I like the best is Data.Sequence, which does indeed have "near-list-like beauty." Not sure how I feel about the required strictness of operations. Further suggestions and different ideas are always welcome.

import Data.Sequence ((|>), (<|), zipWith, singleton)
import Prelude hiding (zipWith)

ptri = singleton 1 : mkptri ptri

mkptri (seq:seqs) = newRow : mkptri seqs
    where newRow = zipWith (+) seq (0 <| seq) |> 1

Now we just need List to be a class, so that other structures can use its methods like zipWith without hiding it from Prelude, or qualifying it. :P

List isn't a class, but ListLike is. There's a Data.Sequence instance available too. http://hackage.haskell.org/package/ListLike-3.0.1 — John L, Mar 04 '11 at 12:19
Random annoyance: I first tried to write `newRow = 1 <| zipWith (+) seq (drop 1 seq) |> 1`, which imho is very beautiful for expressing Pascal's triangle by explicitly showing the 1 at both ends of every row. Sadly, I got this error: `cannot mix '<|' [infixr 5] and '|>' [infixl 5] in the same infix expression` — Dan Burton, Mar 05 '11 at 22:46
That second `newRow` is very beautiful. I had a not-ideally-successful go at it with ZipLists (I was planning something more general, but it was too complicated), http://hpaste.org/44613/pascals_ziplist. With the idiom brackets of the `she` preprocessor, and a few homemade combinators it looks like this: `pascalsNextLine old = 1 <& (| tail' old + init' old |) &> 1` It would be interesting to know if there is anything to generalize in this idea of yours. — applicative, Mar 08 '11 at 05:44
It seems like both `|>` and `<|` have the same precedence, so they can't go next to eachother. I wonder if there is a way to change that, without it breaking other file, sorta on a per file basis. — Theo Belaire, Mar 13 '11 at 14:20
I was wondering about this too. One idea I had was to use an O(1) `cons` as an append, but then to O(n) `reverse` the resulting list. See also: the contrast between the implementations of [`map`](http://hackage.haskell.org/package/base-4.12.0.0/docs/src/GHC.Base.html#map) versus [`reverse`](http://hackage.haskell.org/package/base-4.12.0.0/docs/src/GHC.List.html#reverse). — Mateen Ulhaq, Feb 20 '19 at 07:58

score 30 · Answer 1 · answered Mar 05 '11 at 19:38

Keep in mind that what looks poor asymptotics might actually not be, because you are working in a lazy language. In a strict language, appending to the end of a linked list in this way would always be O(n). In a lazy language, it's O(n) only if you actually traverse to the end of the list,in which case you would have spent O(n) effort anyway. So in many cases, laziness saves you.

This isn't a guarantee... for example, k appends followed by a traversal will still run in O(nk) where it could have been O(n+k). But it does change the picture somewhat. Thinking about performance of single operations in terms of their asymptotic complexity when the result is immediately forced doesn't always give you the right answer in the end.

score 17 · Accepted Answer · answered Mar 04 '11 at 01:27

17

Standard Sequence has O(1) for addition from 'both ends' and O(log(min(n1,n2))) for general concatenation:

http://hackage.haskell.org/packages/archive/containers/latest/doc/html/Data-Sequence.html

The difference from lists though is that Sequence is strict

answered Mar 04 '11 at 01:27

Ed'ka

6,595
29
30

3

One of the most frustrating things for new users of Data.Sequence is that there aren't many functions exported with it. You need to make use of the Functor, Foldable, Monoid, and Traversable instances to gain access to many common operations. – John L Mar 04 '11 at 12:22
22

@John: Yes, but this is a virtue in many ways. Once you know those operations you can use them on almost any data structure. – Edward Kmett Mar 05 '11 at 22:01

score 10 · Answer 3 · answered Mar 04 '11 at 04:07

10

Something like this explicit recursion avoids your append "anti-idiom". Although, I don't think it is as clear as your example.

ptri = []:mkptri ptri
mkptri (xs:ys) = pZip xs (0:xs) : mkptri ys
    where pZip (x:xs) (y:ys) = x+y : pZip xs ys
          pZip [] _ = [1]

answered Mar 04 '11 at 04:07

David Powell

520
2
8

2

+1 Clever answer. You're right, it's less clear, but it does avoid the anti-idiom, *and* it's about as efficient as it possibly can be (afaik). Not the "accepted" answer, though, cuz I was hoping for a general-use solution. You could actually make the last line `pZip _ _ = [1]` – Dan Burton Mar 04 '11 at 04:24

score 8 · Answer 4 · answered Mar 05 '11 at 21:52

In your code for Pascal's Triangle, ++ [x] is not actually a problem. Since you have to produce a new list on the left hand side of ++ anyway, your algorithm is inherently quadratic; you cannot make it asymptotically faster merely by avoiding ++.

Also, in this particular case, when you compile -O2, GHC's list fusion rules (should) eliminate the copy of the list that ++ would normally create. This is because zipWith is a good producer and ++ is a good consumer in it's first argument. You can read about these optimizations in GHC User's Guide.

+1 Cool, I imagined something like this might be true. Great link. — Dan Burton, Mar 05 '11 at 22:39

score 5 · Answer 5 · answered Mar 04 '11 at 03:27

5

Depending on your use case, the ShowS method (appending via function composition) might be useful.

answered Mar 04 '11 at 03:27

geekosaur

59,309
11
123
114

Given the exact algorithm in use, I'd use the `ShowS` approach.. But it's only going to be a constant-factor improvement, anyway. Building the row in question is already O(n). Adding another O(n) step doesn't make it too much worse. – Carl Mar 04 '11 at 04:08

score 5 · Answer 6 · answered Mar 04 '11 at 08:59

If you just want cheap append (concat) and snoc (cons at the right) a Hughes list, also called DList on Hackage, is the simplest to implement. If you want to know how they work, look at Andy Gill and Graham Hutton's first Worker Wrapper paper, John Hughes's original paper doesn't seem to be online. As others have said above ShowS is a String specialized Hughes list / DList.

A JoinList is a bit more work to implement. This is a binary tree but with a list API - concat and snoc are cheap and you can reasonably fmap it: the DList on Hackage has a functor instance but I contend it shouldn't have - the functor instance has to metamorph in and out of a regular list. If you want a JoinList then you'll need to roll your own - the one on Hackage is mine and it's not efficient, nor well written.

Data.Sequence has efficient cons and snoc, and is good for other operations - takes, drops etc. that a JoinList is slow for. Because the internal finger tree implementation of Data.Sequence has to balance the tree, append is more work than its JoinList equivalent. In practice because Data.Sequence is better written, I'd expect it still out-performs my JoinList for append.

rampion · Answer 7 · 2011-03-06T01:40:31.843

4

another way would to avoid concatenation at all by just using infinite lists:

ptri = zipWith take [0,1..] ptri'
  where ptri' = iterate stepRow $ repeat 0
        stepRow row = 1 : zipWith (+) row (tail row)

edited Mar 06 '11 at 01:40

answered Mar 05 '11 at 04:37

rampion

87,131
49
199
315

1

You can write that as: `ptri = zipWith take [1..] . iterate ((zipWith (+) <*> tail) . (0:)) $ 1 : repeat 0` (assuming that you have imported `Control.Applicative`). Another interesting way to do it is: `ptri = zipWith take [1..] . transpose . zipWith (++) (iterate (0 :) []) . iterate (scanl1 (+)) $ repeat 1` – Yitz Mar 06 '11 at 12:03

score 3 · Answer 8 · answered Mar 05 '11 at 18:59

3

I wouldn't necessarily call your code "anti-idomatic". Oftentimes, clearer is better, even if that means to sacrifice a few clock cycles.

And in your particular case, the append at the end doesn't actually change the big-O time complexity! Evaluating the expression

zipWith (+) xs (0:xs) ++ [1]

will take time proportional length xs and no fancy sequence data structure is going to change that. If anything, only the constant factor will be affected.

answered Mar 05 '11 at 18:59

Heinrich Apfelmus

11,034
1
39
67

Very true, the total work for each row is still O(n) either way. I liked the Data.Sequence solution, though, because it allowed the same clarity (or, I'd say, slightly more clarity), and it also lessens the amount of O(n) work that must be done per row. – Dan Burton Mar 05 '11 at 20:00
And as lpsmith pointed out above, you don't even lose any clock cycles in this case. GHC is smart enough to optimize them away. – Yitz Mar 06 '11 at 11:27

Tim Perry · Answer 9 · 2011-03-04T05:35:10.100

Chris Okasaki has a design for a queue that addresses this issue. See page 15 of his thesis http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf

You may need to adapt code slightly, but some use of reverse and keeping two pieces of the list lets you work more efficiently on average.

Also, someone put up some list code in the monad reader with efficient operations. I admit, I didn't really follow it, but I thought I could figure it out if I concentrated. It turns out it was Douglas M. Auclair in Monad Reader issue 17 http://themonadreader.files.wordpress.com/2011/01/issue17.pdf

I realized the above answer does not directly address the question. So, for giggles, here is my recursive answer. Feel free to tear it apart -- it is not pretty.

import Data.List 

ptri = [1] : mkptri ptri

mkptri :: [[Int]] -> [[Int]]
mkptri (xs:ys) =  mkptri' xs : mkptri ys

mkptri' :: [Int] -> [Int]
mkptri' xs = 1 : mkptri'' xs

mkptri'' :: [Int] -> [Int]
mkptri'' [x]        = [x]
mkptri'' (x:y:rest) = (x + y):mkptri'' (y:rest)

Takashi Yamamiya · Answer 10 · 2011-03-05T05:39:31.957

2

I wrote an example of @geekosaur's ShowS approach. You can see many examples of ShowS in the prelude.

ptri = []:mkptri ptri
mkptri (xs:ys) = (newRow xs []) : mkptri ys

newRow :: [Int] -> [Int] -> [Int]
newRow xs = listS (zipWith (+) xs (0:xs)) . (1:)

listS :: [a] -> [a] -> [a]
listS [] = id
listS (x:xs) = (x:) . listS xs

[edit] As @Dan's idea, I rewrote newRow with zipWithS.

newRow :: [Int] -> [Int] -> [Int]
newRow xs = zipWithS (+) xs (0:xs) . (1:)

zipWithS :: (a -> b -> c) -> [a] -> [b] -> [c] -> [c]
zipWithS z (a:as) (b:bs) xs =  z a b : zipWithS z as bs xs
zipWithS _ _ _ xs = xs

edited Mar 05 '11 at 05:39

answered Mar 04 '11 at 04:27

Takashi Yamamiya

553
5
12

Ok, maybe I'm just dumb, but I don't see the difference between `listS` and `++`, [given the definition of `++`](http://hackage.haskell.org/packages/archive/base/latest/doc/html/src/GHC-Base.html#%2B%2B) is the same, just less pointfree: (++) :: [a] -> [a] -> [a] (++) [] ys = ys (++) (x:xs) ys = x : xs ++ ys – rampion Mar 05 '11 at 00:01
Or you could make `zipWithS` since `zipWith` has to go through each item anyways. – Dan Burton Mar 05 '11 at 04:53
No, I was dumb. Now I realized that these are same! as well as `showString = (++)` in prelude. So newRow can be `newRow xs = ((zipWith (+) xs (0:xs)) ++). (1:)` – Takashi Yamamiya Mar 05 '11 at 05:08
@Dan: I like the idea. The point was stream (concatenating [a]->[a] instead of list) is a idiom and I saw it at many places. – Takashi Yamamiya Mar 05 '11 at 05:48

score 1 · Answer 11 · answered Mar 04 '11 at 14:59

If you're looking for a general purpose solution, then how about this:

mapOnto :: [b] -> (a -> b) -> [a] -> [b]
mapOnto bs f = foldr ((:).f) bs

This gives a simple alternate definition for map:

map = mapOnto []

We can a similar definition for other foldr-based functions, like zipWith:

zipOntoWith :: [c] -> (a -> b -> c) -> [a] -> [b] -> [c]
zipOntoWith cs f = foldr step (const cs)
  where step x g [] = cs
        step x g (y:ys) = f x y : g ys

Again deriving zipWith and zip fairly easily:

zipWith = zipOntoWith []
zip = zipWith (\a b -> (a,b))

Now if we use these general purpose functions, your implementation becomes pretty easy:

ptri :: (Num a) => [[a]]
ptri = [] : map mkptri ptri
  where mkptri xs = zipOntoWith [1] (+) xs (0:xs)

HackerFoo · Answer 12 · 2011-03-11T07:06:48.593

You can represent a list as a function to build a list from []

list1, list2 :: [Integer] -> [Integer]
list1 = \xs -> 1 : 2 : 3 : xs
list2 = \xs -> 4 : 5 : 6 : xs

Then you can easily append lists and add to either end.

list1 . list2 $ [] -> [1,2,3,4,5,6]
list2 . list1 $ [] -> [4,5,6,1,2,3]
(7:) . list1 . (8:) . list2 $ [9] -> [7,1,2,3,8,4,5,6,9]

You can rewrite zipWith to return these partial lists:

zipWith' _ [] _ = id
zipWith' _ _ [] = id
zipWith' f (x:xs) (y:ys) = (f x y :) . zipWith' f xs ys

And now you can write ptri as:

ptri = [] : mkptri ptri
mkptri (xs:yss) = newRow : mkptri yss
    where newRow = zipWith' (+) xs (0:xs) [1]

Taking it further, here's a one-liner that's more symmetrical:

ptri = ([] : ) . map ($ []) . iterate (\x -> zipWith' (+) (x [0]) (0 : x [])) $ (1:)

Or this is simpler yet:

ptri = [] : iterate (\x -> 1 : zipWith' (+) (tail x) x [1]) [1]

Or without zipWith' (mapAccumR is in Data.List):

ptri = [] : iterate (uncurry (:) . mapAccumR (\x x' -> (x', x+x')) 0) [1]

Idiomatic efficient Haskell append?

12 Answers12

Linked