Group adjacent values in a list in Haskell

Question

If I have the list [1,2,3,4,5,6,7] and I want to group 3 (or any other number) adjacent values so I end up with the list: [[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7]]

How would I go about doing this in Haskell?

Daniel Wagner · Accepted Answer · 2018-03-18T23:10:11.333

6

Data.List> f n xs = zipWith const (take n <$> tails xs) (drop (n-1) xs)
Data.List> f 3 [1..7]
[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7]]

Some brief explanations: tails gives us the lists that start at each position in the original. We only want the first few elements of each of these, so we run take n on each. This gets us most of the way there, but leaves a few extra dangling lists at the end; in your example they would be the ones starting from 6, 7, and beyond, so [[6,7],[7],[]]. We could do this by computing the length of the input list and taking only that many final lists, but this doesn't work well for infinite input lists or partially defined input lists. Instead, since the output should always be n elements shorter than the input, we use a standard-ish zipWith trick to cut off the extra elements.

edited Mar 18 '18 at 23:10

answered Mar 18 '18 at 23:05

Daniel Wagner

145,880
9
220
380

@WillNess Thanks for finding that, I've marked this question as a duplicate. – Daniel Wagner Mar 26 '18 at 17:23
@WillNess Nothing embarrassing there, that's how 95% of my dupehammers happen too. And yes, I expect the `foldr (zipWith (:))` version is preferable. – Daniel Wagner Mar 26 '18 at 17:30

score 2 · Answer 2 · edited Jun 20 '20 at 09:12

An alternative explanation of @DanielWagner's solution from a slightly higher level of abstraction:

Original solution:
f n xs = zipWith const (take n <$> tails xs) (drop (n-1) xs)
take n <$> tails xs uses the nondeterminism monad:
type NonDet a = [a]
-- instance Monad NonDet
tails :: [a] -> NonDet [a]
tails nondeterministically "chooses" where the sublist begins, and then the pure function
take n :: [a] -> [a]
is fmap'd under the NonDeterminism layer to chop the tail off. This leaves some flab at the result's end, so we go into the plumbing of NonDet/[] with zipWith fix it.

This new explanation also opens up an optimization. The [] monad has a concept of failure, which is the empty list. If we had a version of take that would fail monadically when it had a too-short argument, we could use it and not worry about removing the short sublists at the end of the result. So:

import Control.Monad((<=<))
import Data.Maybe(maybeToList)

-- Maybe is the simplest failure monad.
-- Doesn't return [[a]] because this could conceivably be used in other
-- contexts and Maybe [a] is "smaller" and clearer than [[a]].
-- "safe" because, in the context of (do xs <- safeTake n ys),
-- length xs == n, definitely.
safeTake :: Int -> [a] -> Maybe [a]
safeTake 0 _ = return []
safeTake n [] = Nothing
safeTake n (x:xs) = (x:) <$> (safeTake $ n - 1) xs

-- maybeToList :: Maybe a -> [a]
-- maybeToList (return x) = return x / maybeToList (Just x) = [x]
-- maybeToList empty      = empty    / maybeToList Nothing  = [ ]
-- (.)   ::            (b ->   c) -> (a ->   b) -> (a ->   c)
-- (<=<) :: Monad m => (b -> m c) -> (a -> m b) -> (a -> m c)
f n = maybeToList . safeTake n <=< tails

f no longer digs through the nondeterminism abstraction with something that is outside the monad. It can also be written in terms of Kliesli composition, which certainly gives it points in the beauty category. A criterion benchmark also shows a 15-20% speedup (under -O2). Personally, I think it's cool that seeing something more abstractly and making the code "prettier" can also confer performance.

I suspect your `$!`s won't actually do anything because `fmap` and `safeTake` are already strict in those arguments. I experimented with a version of `safeTake` which returned `(success, result) :: (Bool, [a])` (and a wrapper function which inspected `success` and returned the appropriate `Maybe`), and the generated Core looks better - the tuples got unboxed so there's less wrapping and unwrapping of heap objects than the `Maybe` version. Didn't benchmark it though because it's late. Might also be fun to try a tail recursive version... — Benjamin Hodgson, Mar 19 '18 at 02:34
@BenjaminHodgson [`foldr (zipWith (:)) (repeat []) . take n . tails`](https://stackoverflow.com/a/24609264/849891) does it all by itself. I don't know if it fuses or not, though... — Will Ness, Mar 26 '18 at 17:19
@HTNW interesting. another way to write `maybeToList . safeTake n` is `map fst . maybeToList . runStateT (sequence $ StateT uncons <$ [1..n])`. Still it'll work through the leftovers, failing safely for each; `foldr (zipWith (:)...` mentioned above just ignores them. Also, this, as does `safeTake`, effectively checks the success of each `uncons`, while `zipWith` approach just works. (and of course the repeated `take n`s traverse over same elements multiple times...) — Will Ness, Mar 27 '18 at 01:29

Elmex80s · Answer 3 · 2018-03-19T13:46:09.813

2

let x:y:ls = [1,2,3,4,5,6,7] in zip3 (x:y:ls) (y:ls) ls

Will give

[(1,2,3),(2,3,4),(3,4,5),(4,5,6),(5,6,7)]

Tuples instead of lists. If you want lists then apply \(a, b, c) -> [a, b, c]. Or do

let x:y:ls = [1,2,3,4,5,6,7] in [[a, b, c] | (a, b, c) <- zip3 (x:y:ls) (y:ls) ls]

edited Mar 19 '18 at 13:46

answered Mar 19 '18 at 13:39

Elmex80s

3,428
1
15
23

1

I was so impressed with the use of pattern matching in a let function. Also zip3 is the perfect tool. I would prefer, however . . . . . . . . . . . . z3 n=[[ x,y,z] | (x,y,z)<-zip3 [1..n] [2..n] [3..n]] . . . . . . . . . .which also produces the triples in lists. – fp_mora Mar 28 '18 at 17:27
1

@fpmora yes nice solution, however I assumed he wants a general solution, otherwise this `[(i, i + 1, i + 2) | i <- [1 .. (n - 2)]]` would have done the job as well. – Elmex80s Mar 28 '18 at 20:37

score 0 · Answer 4 · answered Mar 19 '18 at 00:35

0

Alternative to Daniel's very clever answer, you can use take and explicit recursion.

f n (x:xs) | length xs < (n-1) = []
           | otherwise         = (x : take (n-1) xs) : f n xs

However this will end up being substantially slower since it's necessary to force length xs so many times.

answered Mar 19 '18 at 00:35

Adam Smith

52,157
12
73
112

1

In addition to being slower, this will outright fail to terminate on infinite lists, while the other answer will produce a correct infinite list result. – Silvio Mayolo Mar 19 '18 at 00:47
Even if it isn't as useful as the other example it's still nice to see other solutions for learning purposes. – Qwertie Mar 19 '18 at 00:55
Indeed and I was also gong to suggest [ [x-2,x-1, x] | x <- [3..n] ]. Within your comprehension the result can be in square brackets as, I'm sure you well know. Thank you. – fp_mora Mar 28 '18 at 21:28

fp_mora · Answer 5 · 2018-05-08T02:55:33.773

It doesn't appear Haskell will iterate over a list 3 at a time. list comprehensions won't either. Without iteration, it seems, there is no way to handle infinite lists. IDK, I'm too new to Haskell. All I could come up with is a recursive function. ugh. A feature of the function is that it can be parameterized, it can produce arbitrary size sub-lists. Using Haskell pattern matching requires specifying the number of elements in each sub list, like for 3, (x:y:z:xs) and Haskell will reduce xs by 3 on each iteration. Four at a time would be (w:x:y:z:xs). This introduced bad hard coding into the function. So this function has to reduce xs using drop 3 but 3 can be parameter as take 3 can also. A helper function to take the size of each sub-list and to pass the constant [] (null list) to the primary function as the first parmeter would be helpful.

fl2 (ys) (xs) = if null xs then ys else (fl2 (ys ++ [take 3 xs]) (drop 3 xs))

fl2 [] [1..12] ......... [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]

What is interesting is the function pair [take 3 l, drop 3 l] when used in a function is good for one [[1,2,3],[4,5,6,7,8,9,10,11,12]]. This is basically what the fl2 function uses but the sub-lists must accumulate.

Edit 3/19/18 Messing around with this I found, to my surprise, that this works. I could probable clean it up a lot but, for now ...

f7 ls = (take 3 ls, drop 3 ls)
f8 (a,b) = if null b then [a] else a : (f8 (f7 b))

How this is run is not pretty, but...

f8 $ f7 [1..12]

produces [[1,2,3],[4,5,6],[7,8,9],[10,11,12]] This IMO is still better that passing a [] as a parameter.

This last function and probably the previous handles [] with [], [1] with [1] and odd numbered lists, truncating the very last list accordingly. None of this were a consideration of writing of the function but is is a result.

Edit 3/23/2018

Well thanks to dfeuer, I tried splitAt instead of (\xs -> (take 3 xs, drop 3 xs). I also changed the syntax of the single line function to not use if-then-else. Calling the function is still the same. A third wrapper function might well be in order.

f7 = (splitAt 3)
f8 (a,b) | null b = [a] | True = a : (f8 $ f7 b)

I am smitten by the use of pattern matching guards in single line functions. If-then-else is so ugly. I agree that 'if' should be a function like it is in lisp-like languages.

Edit 3/26/2018

Errrr. I don't know how I got the specification wrong. The result list is [1,2,3],[2,3,4],[3,4,5] not [1,2,3],[4,5,6],[7,8,9] I feel dumber than normal. Here are two revised function to produce the correct result list. The first function generates all possible triples in range because the sub-lists of the result are triples.

lt = [[a,b,c] | a <- [1..10], b <- [2..11], c <- [3..12]]

The second function picks out the correct elements of 'lt' by a calculated index value. For the 12 element input list the index values are 1,111,222,...999 so are multiples of 111. So here for an input list of [1..12]

[lt!!(x*111) | x <- [0..9]]

Produces

[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7],[6,7,8],[7,8,9],[8,9,10],[9,10,11],[10,11,12]]

Edit 3/27/2018 Well, the list generated to pick values from had as the last value of a set the next needed in any list. I was recently taught to look closely and together at the lists generated. I generated a few lists from the lt generating function above. The last element of each list were the exact values for any size list. lt is no longer needed. This single line does everything.

grp3 n = map (\x -> [(x-2),(x-1),x]) [3..n]

The following is most general. It will group by any amount and it will include sublists of groups less than the group size specified for completeness.

grp n ls = [take n.drop x $ ls)|x<-[0,n..]]

5/7/2018 I do this too often. I find relatively quickly that some of my functions can change character with only minor changes in the code. I am careless about versions. This last version generates the wrong type list.

take 6 $ grp 3 [1..]

[[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]

Take the n out of the enumeration

grp n ls = [(take n.drop i$ls)| (i,x) <- zip [0..] ls]

and

take 6 $ grp 3 [1..]

[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7],[6,7,8]]

I added the zip to both now to limit the output.

grp 3 [1..10]

[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7],[6,7,8],[7,8,9],[8,9,10],[9,10],[10]]

What's wrong with a recursive function? You should also look at the `splitAt` function. — dfeuer, Mar 21 '18 at 01:32
Read some of the comments here and others in Stack Overflow. In fact the term stack overflow is from recursive functions not able to handle a large data set. Some recursive functions chock too easily with a very long list and infinite lists are impossible. That said, the sheer beauty of recursive functions is what is most attractive. — fp_mora, Mar 21 '18 at 20:10
Recursive functions don't have to lead to stack growth. Read the classic "Debunking the expensive procedure call myth" at [readscheme](http://library.readscheme.org/page1.html) to start. Haskell's evaluation model is a bit different, being based on graph reduction. Recursion is the *only* primitive looping construct in Haskell. All other iterative forms are implemented by recursion. The compiler's job is to make that work efficiently, and it's very good at that job. — dfeuer, Mar 21 '18 at 20:36
Well, thank you. I take your words to heart. I was under the impression that Haskell converted primitive recursion to iteration not the other way around. I do talk from experience, I've had stack overflow errors with recursion in Haskell. It leaves a very bad taste in one's mouth. The odd thing is I much prefer recursive functions and mutually recursive functions. The more declarative the function, the more correct and so the better. They make for the easiest to maintain and change also. — fp_mora, Mar 21 '18 at 21:24
The distinction between "recursion" and "iteration" is a slippery one. Once it gets down to assembly code, it's all about putting things in registers and jumping. You *do* have to be careful about how you write your Haskell code to prevent things from blowing up, but the rules for doing so are not the same as the ones in C. For example, the definition `length [] = 0; length (_ : xs) = 1 + length xs` will blow up. But the definition `length = go 0 where go acc [] = acc; go !acc (_ : xs) = go (acc + 1) xs` will work just fine. You'll learn.... — dfeuer, Mar 22 '18 at 04:17
it is customary to put each guard in its own line, while aligning the bars `|` to line up vertically. there's alternative ways to write `if`, e.g. `if | 1==0 -> 0 | 1==1 -> 1 | 1==2 -> 2` with [MultiWayIf](http://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#multi-way-if-expressions). there's also the [bool](https://hackage.haskell.org/package/base-4.11.0.0/docs/Data-Bool.html#v:bool) function, equivalent to "if", but with it the selection is positional, so, not evident in the code. — Will Ness, Mar 26 '18 at 17:40
Indeed and when I create a file the guards are aligned. If I create a file, it is because the logic is much more involved. The layout is essential for reflecting the logic of the code. Some things are even more important such as the specification of the result being correctly understood. — fp_mora, Mar 26 '18 at 22:11

Group adjacent values in a list in Haskell

5 Answers5