2

(In my actual use case I have a list of type [SomeType], SomeType having a finite number of constructors, all nullary; in the following I'll use String instead of [SomeType] and use only 4 Chars, to simplify a bit.)

I have a list like this "aaassddddfaaaffddsssadddssdffsdf" where each element can be one of 'a', 's', 'd', 'f', and I want to do some further processing on each contiguous sequence of non-as, let's say turning them upper case and reversing the sequence, thus obtaining "aaaFDDDDSSaaaSSSDDFFaFDSFFDSSDDD". (I've added the reversing requirement to make it clear that the processing involves all the contiguous non 'a'-s at the same time.)

To turn each sub-String upper case, I can use this:

func :: String -> String
func = reverse . map Data.Char.toUpper

But how do I run that func only on the sub-Strings of non-'a's?

My first thought is that Data.List.groupBy can be useful, and the overall solution could be:

concat $ map (\x -> if head x == 'a' then x else func x)
       $ Data.List.groupBy ((==) `on` (== 'a')) "aaassddddfaaaffddsssadddssdffsdf"

This solution, however, does not convince me, as I'm using == 'a' both when grouping (which to me seems good and unavoidable) and when deciding whether I should turn a group upper case.

I'm looking for advices on how I can accomplish this small task in the best way.

Enlico
  • 23,259
  • 6
  • 48
  • 102

5 Answers5

1

If we need to remember the difference between the 'a's and the rest, let's put them in different branches of an Either. In fact, let's define a newtype now that we are at it:

{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE ViewPatterns #-}

import Data.Bifoldable
import Data.Char
import Data.List

newtype Bunched a b = Bunched [Either a b] deriving (Functor, Foldable)

instance Bifunctor Bunched where
  bimap f g (Bunched b) = Bunched (fmap (bimap f g) b)

instance Bifoldable Bunched where
  bifoldMap f g (Bunched b) = mconcat (fmap (bifoldMap f g) b)

fmap will let us work over the non-separators. fold will return the concatenation of the non-separators, bifold will return the concatenation of everything. Of course, we could have defined separate functions unrelated to Foldable and Bifoldable, but why avoid already existing abstractions?

To split the list, we can use an unfoldr that alternately searches for as and non-as with the span function:

splitty :: Char -> String -> Bunched String String
splitty c str = Bunched $ unfoldr step (True, str)
  where
    step (_, []) = Nothing
    step (True, span (== c) -> (as, ys)) = Just (Left as, (False, ys))
    step (False, span (/= c) -> (xs, ys)) = Just (Right xs, (True, ys))

Putting it to work:

ghci> bifold . fmap func . splitty 'a' $ "aaassddddfaaaffddsssadddssdffsdf"
"aaaFDDDDSSaaaSSSDDFFaFDSFFDSSDDD"

Note: Bunched is actually the same as Tannen [] Either from the bifunctors package, if you don't mind the extra dependency.

danidiaz
  • 26,936
  • 4
  • 45
  • 95
1

You could classify the list elements by the predicate before grouping. Note that I’ve reversed the sense of the predicate to indicate which elements are subject to the transformation, rather than which elements are preserved.

{-# LANGUAGE ScopedTypeVariables #-}

import Control.Arrow ((&&&))
import Data.Function (on)
import Data.Monoid (First(..))

mapSegmentsWhere
  :: forall a. (a -> Bool) -> ([a] -> [a]) -> [a] -> [a]
mapSegmentsWhere p f
  = concatMap (applyMatching . sequenceA)  -- [a]
  . groupBy ((==) `on` fst)                -- [[(First Bool, a)]]
  . map (First . Just . p &&& id)          -- [(First Bool, a)]
  where
    applyMatching :: (First Bool, [a]) -> [a]
    applyMatching (First (Just matching), xs)
      = applyIf matching f xs

    applyIf :: forall a. Bool -> (a -> a) -> a -> a
    applyIf condition f
      | condition = f
      | otherwise = id

Example use:

> mapSegmentsWhere (/= 'a') (reverse . map toUpper) "aaassddddfaaaffddsssadddssdffsdf"
"aaaFDDDDSSaaaSSSDDFFaFDSFFDSSDDD"

Here I use the First monoid with sequenceA to merge the lists of adjacent matching elements from [(Bool, a)] to (Bool, [a]), but you could just as well use something like map (fst . head &&& map snd). You can also skip the ScopedTypeVariables if you don’t want to write the type signatures; I just included them for clarity.

Jon Purdy
  • 53,300
  • 8
  • 96
  • 166
  • For my own record, playing a bit, I've verified that `&&&`, in this specific case, is equivalent to `(,) <$> (First . Just . p) <*> id`; and given its description ([_send the input to both argument arrows and combine their output_](https://hackage.haskell.org/package/base-4.14.0.0/docs/Control-Arrow.html#v:-38--38--38-)), it seems exactly that; to rephrase it, my understanding is that `f &&& g` is equal to `(,) <$> f <*> g`. – Enlico Sep 27 '20 at 21:37
  • So, in a way, you use `p` once for each character, but you don't throw away that result (as I do in my attempt instead), but you store it using the trick of the pair; then you still rely on the first `Bool` of each group, but you do it via the `First` monoid. However this makes you use a `Maybe` where it's not really needed, no? However, I still need some time to through the `concatMap (applyMatching . sequenceA)` part. – Enlico Sep 27 '20 at 21:57
  • @Enrico: Yeah, `&&&` (which I pronounce “and” or “both…and…of…”) in the `->` arrow is just a more concise way of doing `liftA2 (,) f g`. The `Maybe` is unfortunate, I agree, but the problem is that you can’t use `Data.Semigroup.First` (which doesn’t have a `Maybe`, and consequently only has a `Semigroup` instance, no `Monoid`) because `sequenceA` to go from `[(a, b)]` to `(a, [b])` requires `Monoid a`. There’s probably a way around it that I’m not seeing. It would help if `groupBy` returned `[NonEmpty a]` and there were more `Semigroup`-only tools, but that ecosystem isn’t fleshed out yet. – Jon Purdy Sep 28 '20 at 16:54
  • I like it, and I've understood most of it (well, writing this code myself out of nothing is another story). I only have a doubt about how `sequenceA` works in this case. I mean, I see the result and see it's reasonable, but I don't understand how it happens. I've asked the question [here](https://stackoverflow.com/questions/64177058/how-does-sequencea-work-on-lists-of-pairs). – Enlico Oct 02 '20 at 19:12
1

There are other answers here, but I think they get too excited about iteration abstractions. A manual recursion, alternately taking things that match the predicate and things that don't, makes this problem exquisitely simple:

onRuns :: Monoid m => (a -> Bool) -> ([a] -> m) -> ([a] -> m) -> [a] -> m
onRuns p = go p (not . p) where
    go _ _ _ _ [] = mempty
    go p p' f f' xs = case span p xs of
        (ts, rest) -> f ts `mappend` go p' p f' f rest

Try it out in ghci:

Data.Char> onRuns ('a'==) id (reverse . map toUpper) "aaassddddfaaaffddsssadddssdffsdf"
"aaaFDDDDSSaaaSSSDDFFaFDSFFDSSDDD"
Daniel Wagner
  • 145,880
  • 9
  • 220
  • 380
0

Here is a simple solution - function process below - that only requires that you define two functions isSpecial and func. Given a constructor from your type SomeType, isSpecial determines whether it is one of those constructors that form a special sublist or not. The function func is the one you included in your question; it defines what should happen with the special sublists.

The code below is for character lists. Just change isSpecial and func to make it work for your lists of constructors.

isSpecial c = c /= 'a'
func = reverse . map toUpper

turn = map (\x -> ([x], isSpecial x)) 

amalgamate []  = []
amalgamate [x] = [x]
amalgamate ((xs, xflag) : (ys, yflag) : rest)
   | xflag /= yflag = (xs, xflag) : amalgamate ((ys, yflag) : rest)
   | otherwise      = amalgamate ((xs++ys, xflag) : rest)

work = map (\(xs, flag) -> if flag then func xs else xs)

process = concat . work . amalgamate . turn

Let's try it on your example:

*Main> process "aaassddddfaaaffddsssadddssdffsdf"
"aaaFDDDDSSaaaSSSDDFFaFDSFFDSSDDD"
*Main> 

Applying one function at a time, shows the intermediate steps taken:

*Main> turn "aaassddddfaaaffddsssadddssdffsdf"
[("a",False),("a",False),("a",False),("s",True),("s",True),("d",True),
("d",True),("d",True),("d",True),("f",True),("a",False),("a",False),
("a",False),("f",True),("f",True),("d",True),("d",True),("s",True),
("s",True),("s",True),("a",False),("d",True),("d",True),("d",True),
("s",True),("s",True),("d",True),("f",True),("f",True),("s",True),
("d",True),("f",True)]
*Main> amalgamate it
[("aaa",False),("ssddddf",True),("aaa",False),("ffddsss",True),
("a",False),("dddssdffsdf",True)]
*Main> work it
["aaa","FDDDDSS","aaa","SSSDDFF","a","FDSFFDSSDDD"]
*Main> concat it
"aaaFDDDDSSaaaSSSDDFFaFDSFFDSSDDD"
*Main> 
Håkan
  • 154
  • 6
0

We can just do what you describe, step by step, getting a clear simple minimal code which we can easily read and understand later on:

foo :: (a -> Bool) -> ([a] -> [a]) -> [a] -> [a]
foo p f xs = [ a
         | g <- groupBy ((==) `on` fst) 
                      [(p x, x) | x <- xs]  -- [ (True, 'a'), ... ]
         , let (t:_, as) = unzip g          -- ( [True, ...], "aaa" )
         , a <- if t then as else (f as) ]  -- final concat

         -- unzip :: [(b, a)] -> ([b], [a])

We break the list into same-p spans and unpack each group with the help of unzip. Trying it out:

> foo (=='a') reverse "aaabcdeaa"
"aaaedcbaa"

So no, using == 'a' is avoidable and hence not especially good, introducing an unnecessary constraint on your data type when all we need is equality on Booleans.

Will Ness
  • 70,110
  • 9
  • 98
  • 181
  • You can match the originally-requested type a bit closer by adding an additional clause, as in `[ a | ... , a <- if t then as else f as ]`. ...and if you want to go all in on list comprehensions, how about `[(p x, x) | x <- xs]` for the `zip`+`map` bit? – Daniel Wagner Feb 14 '21 at 14:44
  • ah yes I missed the final concat, aren't I. and the bit about zip+map is also good, thanks. see, LCs *are* nice and easy to work with. :) – Will Ness Feb 14 '21 at 16:55