If I have the list [1,2,3,4,5,6,7]
and I want to group 3 (or any other number) adjacent values so I end up with the list: [[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7]]
How would I go about doing this in Haskell?
Data.List> f n xs = zipWith const (take n <$> tails xs) (drop (n-1) xs)
Data.List> f 3 [1..7]
[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7]]
Some brief explanations: tails
gives us the lists that start at each position in the original. We only want the first few elements of each of these, so we run take n
on each. This gets us most of the way there, but leaves a few extra dangling lists at the end; in your example they would be the ones starting from 6
, 7
, and beyond, so [[6,7],[7],[]]
. We could do this by computing the length of the input list and taking only that many final lists, but this doesn't work well for infinite input lists or partially defined input lists. Instead, since the output should always be n
elements shorter than the input, we use a standard-ish zipWith
trick to cut off the extra elements.
An alternative explanation of @DanielWagner's solution from a slightly higher level of abstraction:
Original solution:
f n xs = zipWith const (take n <$> tails xs) (drop (n-1) xs)
take n <$> tails xs
uses the nondeterminism monad:type NonDet a = [a] -- instance Monad NonDet tails :: [a] -> NonDet [a]
tails
nondeterministically "chooses" where the sublist begins, and then the pure functiontake n :: [a] -> [a]
is
fmap
'd under theNonDet
erminism layer to chop the tail off. This leaves some flab at the result's end, so we go into the plumbing ofNonDet
/[]
withzipWith
fix it.
This new explanation also opens up an optimization. The []
monad has a concept of failure, which is the empty list. If we had a version of take
that would fail monadically when it had a too-short argument, we could use it and not worry about removing the short sublists at the end of the result. So:
import Control.Monad((<=<))
import Data.Maybe(maybeToList)
-- Maybe is the simplest failure monad.
-- Doesn't return [[a]] because this could conceivably be used in other
-- contexts and Maybe [a] is "smaller" and clearer than [[a]].
-- "safe" because, in the context of (do xs <- safeTake n ys),
-- length xs == n, definitely.
safeTake :: Int -> [a] -> Maybe [a]
safeTake 0 _ = return []
safeTake n [] = Nothing
safeTake n (x:xs) = (x:) <$> (safeTake $ n - 1) xs
-- maybeToList :: Maybe a -> [a]
-- maybeToList (return x) = return x / maybeToList (Just x) = [x]
-- maybeToList empty = empty / maybeToList Nothing = [ ]
-- (.) :: (b -> c) -> (a -> b) -> (a -> c)
-- (<=<) :: Monad m => (b -> m c) -> (a -> m b) -> (a -> m c)
f n = maybeToList . safeTake n <=< tails
f
no longer digs through the nondeterminism abstraction with something that is outside the monad. It can also be written in terms of Kliesli composition, which certainly gives it points in the beauty category. A criterion
benchmark also shows a 15-20% speedup (under -O2
). Personally, I think it's cool that seeing something more abstractly and making the code "prettier" can also confer performance.
let x:y:ls = [1,2,3,4,5,6,7] in zip3 (x:y:ls) (y:ls) ls
Will give
[(1,2,3),(2,3,4),(3,4,5),(4,5,6),(5,6,7)]
Tuples instead of lists. If you want lists then apply \(a, b, c) -> [a, b, c]
. Or do
let x:y:ls = [1,2,3,4,5,6,7] in [[a, b, c] | (a, b, c) <- zip3 (x:y:ls) (y:ls) ls]
Alternative to Daniel's very clever answer, you can use take
and explicit recursion.
f n (x:xs) | length xs < (n-1) = []
| otherwise = (x : take (n-1) xs) : f n xs
However this will end up being substantially slower since it's necessary to force length xs
so many times.
It doesn't appear Haskell will iterate over a list 3 at a time. list comprehensions won't either. Without iteration, it seems, there is no way to handle infinite lists. IDK, I'm too new to Haskell. All I could come up with is a recursive function. ugh. A feature of the function is that it can be parameterized, it can produce arbitrary size sub-lists. Using Haskell pattern matching requires specifying the number of elements in each sub list, like for 3, (x:y:z:xs) and Haskell will reduce xs by 3 on each iteration. Four at a time would be (w:x:y:z:xs). This introduced bad hard coding into the function. So this function has to reduce xs using drop 3 but 3 can be parameter as take 3 can also. A helper function to take the size of each sub-list and to pass the constant [] (null list) to the primary function as the first parmeter would be helpful.
fl2 (ys) (xs) = if null xs then ys else (fl2 (ys ++ [take 3 xs]) (drop 3 xs))
fl2 [] [1..12] ......... [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]
What is interesting is the function pair [take 3 l, drop 3 l] when used in a function is good for one [[1,2,3],[4,5,6,7,8,9,10,11,12]]. This is basically what the fl2 function uses but the sub-lists must accumulate.
Edit 3/19/18 Messing around with this I found, to my surprise, that this works. I could probable clean it up a lot but, for now ...
f7 ls = (take 3 ls, drop 3 ls)
f8 (a,b) = if null b then [a] else a : (f8 (f7 b))
How this is run is not pretty, but...
f8 $ f7 [1..12]
produces [[1,2,3],[4,5,6],[7,8,9],[10,11,12]] This IMO is still better that passing a [] as a parameter.
This last function and probably the previous handles [] with [], [1] with [1] and odd numbered lists, truncating the very last list accordingly. None of this were a consideration of writing of the function but is is a result.
Edit 3/23/2018
Well thanks to dfeuer, I tried splitAt instead of (\xs -> (take 3 xs, drop 3 xs). I also changed the syntax of the single line function to not use if-then-else. Calling the function is still the same. A third wrapper function might well be in order.
f7 = (splitAt 3)
f8 (a,b) | null b = [a] | True = a : (f8 $ f7 b)
I am smitten by the use of pattern matching guards in single line functions. If-then-else is so ugly. I agree that 'if' should be a function like it is in lisp-like languages.
Edit 3/26/2018
Errrr. I don't know how I got the specification wrong. The result list is [1,2,3],[2,3,4],[3,4,5] not [1,2,3],[4,5,6],[7,8,9] I feel dumber than normal. Here are two revised function to produce the correct result list. The first function generates all possible triples in range because the sub-lists of the result are triples.
lt = [[a,b,c] | a <- [1..10], b <- [2..11], c <- [3..12]]
The second function picks out the correct elements of 'lt' by a calculated index value. For the 12 element input list the index values are 1,111,222,...999 so are multiples of 111. So here for an input list of [1..12]
[lt!!(x*111) | x <- [0..9]]
Produces
[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7],[6,7,8],[7,8,9],[8,9,10],[9,10,11],[10,11,12]]
Edit 3/27/2018 Well, the list generated to pick values from had as the last value of a set the next needed in any list. I was recently taught to look closely and together at the lists generated. I generated a few lists from the lt generating function above. The last element of each list were the exact values for any size list. lt is no longer needed. This single line does everything.
grp3 n = map (\x -> [(x-2),(x-1),x]) [3..n]
The following is most general. It will group by any amount and it will include sublists of groups less than the group size specified for completeness.
grp n ls = [take n.drop x $ ls)|x<-[0,n..]]
5/7/2018 I do this too often. I find relatively quickly that some of my functions can change character with only minor changes in the code. I am careless about versions. This last version generates the wrong type list.
take 6 $ grp 3 [1..]
[[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]
Take the n
out of the enumeration
grp n ls = [(take n.drop i$ls)| (i,x) <- zip [0..] ls]
and
take 6 $ grp 3 [1..]
[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7],[6,7,8]]
I added the zip to both now to limit the output.
grp 3 [1..10]
[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7],[6,7,8],[7,8,9],[8,9,10],[9,10],[10]]