Eliminate consecutive duplicates from a string

Question

I want to eliminate consecutive duplicates from a string like f "aaabbbcccdeefgggg" = "abcdefg"

This is my code

f :: String -> String
f "" = ""
f "_" = "_"
f (x : xs : xss)
    | x == xs   = f (xs : xss)
    | otherwise = x : f (xs : xss)

I got as error non-exhaustive patterns, I think it's from second line, the program doesn't know how to deal when it's only 1 char left. How should I fix it?

when its only 1 char left in string, I think this is the problem — , Jan 19 '22 at 10:09
You only process one particula single-char string, there are more of them. — bipll, Jan 19 '22 at 10:09
`"_"` matches the literal string `"_"`. I guess what you want is `[x]` to match a singleton list. — michid, Jan 19 '22 at 10:10
I believe the issue is best summarized as follows: the pattern `"a"` is a shorthand for the pattern `['a']` (or `'a':[]`) where `'a'` is a character literal -- this is not the same pattern as `[a]` (or `a:[]`) where `a` here is a variable name. In your case, you have the character literal `'_'` vs the wildcard pattern `_`, but the principle is the same -- these are distinct constructs. — chi, Jan 19 '22 at 18:07

Willem Van Onsem · Answer 1 · 2022-01-19T10:24:48.227

The "_" pattern does not match a string with any character, it matches the string that contains an underscore.

You can use [_] as pattern for the singleton string, so:

f :: String -> String
f "" = ""
f s@[_] = s
f (x : xs : xss)
    | x == xs   = f (xs : xss)
    | otherwise = x : f (xs : xss)

here we use s@ to capture the string with one character as s.

or we can simplify this with:

f :: String -> String
f (x : xs : xss)
    | x == xs   = f (xs : xss)
    | otherwise = x : f (xs : xss)
f s = s

Enlico · Answer 2 · 2023-05-09T15:02:25.293

I want to eliminate consecutive duplicates from a string like f "aaabbbcccdeefgggg" = "abcdefg"

You can group the letters which are equal (via Data.List.group), and then take the first of each group (via map head, which applies head to each element of the list and gives you back the list of the results):

import Data.List (group) -- so we write group instead of Data.List.group
map head $ group "aaabbbcccdeefgggg"

This can be seen as the application of map head after the application of group to the input String. Therefore, your f can be defined like the composition of those two functions:

f :: String -> String
f = map head . group

For completeness, as you seem to be new to Haskell, here are a few details:

Data.List.group "aaabbbcccdeefgggg" returns ["aaa","bbb","ccc","d","ee","f","gggg"];
f $ a b c is the same as f (a b c);
. is the composition operator, and it is such that (f . g) x == f (g x).
- Unrelated to this question, but important to bear in mind in general, . operates on unary functions, which means that if g takes more than one argument, the composition will pipe a partially applied g to f as soon as you give f . g one argument. In other words, if g is, say, binary, then f (g x y) is not equal to (f . g) x y, which will generally not even compile, but to ((f .) . g) x y. Some more example about this, but in JavaScript, is in this answer of mine, which should be fairly readable for a Haskell programmer.

score 7 · Answer 3 · answered Jan 19 '22 at 10:18

7

Or you can make it even simpler if you don't handle things that can be left unhandled:

f :: String -> String
f (x:y:xs) | x == y = f (y:xs)
f (x:xs) = x:f xs
f _ = ""

answered Jan 19 '22 at 10:18

bipll

11,747
1
18
32

dfeuer · Answer 4 · 2022-01-20T08:55:38.913

4

To avoid looking ahead further than necessary, you shouldn't try to match on the first two elements. Instead, keep track of the most recent element.

f :: Eq a => [a] -> [a]
f = start
  where
    start [] = []
    start (x : xs) = x : go x xs

    go _old [] = []
    go old (x : xs)
      | x == old
      = go old xs
      | otherwise
      = x : go x xs

If you want, you can also write this as a fold, where you track whether you've seen an element yet using a Maybe:

f :: Eq a => [a] -> [a]
f xs = foldr go stop xs Nothing
  where
    stop _ = []
    go x r (Just old)
      | x == old = r (Just old)
    go x r _ = x : r (Just x)

Some may find the fold easier to read if it's rearranged a bit.

f :: Eq a => [a] -> [a]
f xs = (foldr go stop xs) Nothing
  where
    stop :: Maybe a -> [a]
    stop = \_ -> []

    go :: a -> (Maybe a -> [a]) -> (Maybe a -> [a])
    go x r = \acc -> case acc of
      Just old
        | x == old -> r (Just old)
      _ -> x : r (Just x)

edited Jan 20 '22 at 08:55

answered Jan 20 '22 at 02:48

dfeuer

48,079
5
63
167

at least indent your `= ...`s at least two more measly spaces than your `| ...`s. the guards and the RHSs are not the same thing. the cases should be clearly visually separated from one another. – Will Ness Jan 20 '22 at 07:46
@dfeuer Please explain your second solution. how foldr works on 4 argument in this example. I know we can use 4 argument on foldr but I can't get it in this example. – S4eed3sm Jan 20 '22 at 08:45
1

@WillNess, I like the equals signs to line up with the guard pipes. I guess I got used to that reading GHC code? – dfeuer Jan 20 '22 at 08:52
1

@s4seed3sm, I wrote a version that's rearranged a bit. Does that help? If not, can you explain what you find confusing? You should try to "translate" the fold version to the recursive function it represents. – dfeuer Jan 20 '22 at 08:56
with the explicit `f [] = []` we don't need the superfluous `Maybe` anymore and get to avoid the repeated construction/deconstruction of this data structure at each step. there is no real choice here -- we create it ourselves after all, in the fully not only predictable but known in advance pattern. more, we know we'll use the `Nothing` only once, at the very start. this needless creation of data to express known-in-advance control is fully an anti-pattern. (unconditional choice is no choice at all). it sometimes leads to a much shorter and cleaner code, and is near unavoidable; not here. – Will Ness Jan 20 '22 at 17:08
1

@WillNess, that's true from a clarity standpoint, but not from a list fusion standpoint. Guard it like that and you won't get fusion. – dfeuer Jan 20 '22 at 17:40
@dfeuer interesting, thanks. and your 2nd version, does it fuse? is `Just` de/constructed all the time or somehow is eliminated altogether? – Will Ness Jan 20 '22 at 21:26
if the problem is the two explicit clauses then they can be replaced by [another `foldr` call](https://gist.github.com/treeowl/a864c00e3ba5b9662c200307413ac918#gistcomment-4035748). will _that_ get fused properly? – Will Ness Jan 20 '22 at 21:51
1

My code optimizes properly when marked `INLINABLE` and compiled with either `-O2` or `-O -fspec-constr`. See [this gist](https://gist.github.com/treeowl/dc4dbea63b6f5e887eed64f9a89b0e32). Fusing on the other side would require `build` and probably some `INLINE` fiddling. – dfeuer Jan 20 '22 at 21:55
1

@WillNess, your code will not fuse because `drop` doesn't work with list fusion at all. – dfeuer Jan 20 '22 at 21:55
does `tail` work? it is safe to call it there. – Will Ness Jan 20 '22 at 21:57
@WillNess, no, same thing. More fundamentally, though, you use `xs` *twice*, which is utterly incompatible with fusion. – dfeuer Jan 20 '22 at 22:01
dang. but if Maybes disappear, that's not needed after all. thanks for the clarifications. – Will Ness Jan 20 '22 at 22:02
@WillNess, yeah, if you look at the Core I pasted, you'll see it compiles to something that looks just like the two stage hand-written version. But as `potato` shows, it also fuses nicely. – dfeuer Jan 20 '22 at 22:04
[just one last thing](https://gist.github.com/treeowl/a864c00e3ba5b9662c200307413ac918#gistcomment-4035779) please. does it really not get fused just because of the two clauses, or I have misunderstood you? – Will Ness Jan 20 '22 at 22:29
In the first solution, is there some benefit to writing `f = start; start [] = ...` rather than `f [] = ...`? It seems like a pointless indirection to me. – amalloy Jan 21 '22 at 05:39
@amalloy, it lets me avoid worrying about name shadowing warnings if I bind the same variable names in the patterns. – dfeuer Jan 21 '22 at 05:40

S4eed3sm · Answer 5 · 2022-01-20T10:41:43.457

1

You also could use foldl. the logic is: comparing the last element of accumulator with current element.

f :: Eq a => [a] -> [a]
f xs = foldl (\x y -> if last x == y then x else x++[y] )  [head xs] xs

here we initiate our accumulator with [head x].

Based on @dfeuer hint I change my solution:

-- with foldl
f :: Eq a => [a] -> [a]
f xs = snd $ foldl  opr (Nothing, []) xs
  where
    opr (Just old, acc) n
      | old == n = (Just old, acc)
      | otherwise = (Just n, acc ++ [n])
    opr (Nothing, acc) n = (Just n, acc ++ [n])

-- with foldr
f2 :: Eq a =>[a] -> [a]
f2 xs = snd $ foldr opr (Nothing, []) xs
  where
    opr n (Just old, acc)
      | old == n = (Just old, acc)
      | otherwise = (Just n, n:acc)
    opr n (Nothing, acc) = (Just n, n:acc)

Thanks to @dfeuer, I learned a lot of new things. This is the third version based on comments:

-- with foldl
f :: Eq a => [a] -> [a]
f xs = snd $ foldl opr (Nothing, []) xs
  where
    opr (old, acc) n =
      ( Just n,
        case old of
          Just o
            | o == n -> acc
            | otherwise -> acc ++ [n]
          Nothing -> acc ++ [n]
      )

-- with foldr
f :: Eq a => [a] -> [a]
f xs = snd $ foldr opr (Nothing, []) xs
  where
    opr n (old, acc) =
      ( Just n,
        case old of
          Just o
            | o == n -> acc
            | otherwise -> n : acc
          Nothing -> n : acc
      )

edited Jan 20 '22 at 10:41

answered Jan 19 '22 at 19:25

S4eed3sm

1,398
5
20

4

Can you see why that would be needlessly strict *and* hideously inefficient? Using partial functions like `head` and `last` when you don't need to is also frowned upon. An interesting challenge for you: write an *efficient*, *lazy* version using `foldr` instead of `foldl`. Hint: define appropriate `go` and `stop` to fill in the blanks in this definition: `f :: Eq a => [a] -> [a]; f xs = foldr _go _stop xs Nothing`. Yes, passing four arguments to `foldr` is *intentional*. – dfeuer Jan 20 '22 at 02:26
The second version is much less inefficient, and much cleaner. Nice work! It's still too strict, which will lead to garbage collection inefficiency for lists that aren't pretty short. – dfeuer Jan 20 '22 at 09:10
1

You can make your second version properly lazy using the fact that the result "shapes" are always the same, and the fact that in the first case `old = n`. Can you see how? – dfeuer Jan 20 '22 at 09:14
1

Try it first. Then check against this: https://gist.github.com/treeowl/a864c00e3ba5b9662c200307413ac918 – dfeuer Jan 20 '22 at 09:31
Never mind, I don't think that modification is enough.... – dfeuer Jan 20 '22 at 14:25

Eliminate consecutive duplicates from a string

5 Answers5