5

I'm quite new to Haskell and I'm trying to solve the following problem:

I have a function, that produces an infinite list of strings with different lengths. But the number of strings of a certain length is restricted.

Now I want to extract all substrings of the list with a certain length n . Unfortunately I did a lot of research and tried a lot of stuff, but nothing worked for me.

I know that filter() won't work, as it checks every part of the lists and results in an infinite loop.

This is my function that generates the infinite list:

allStrings =  [ c : s | s <- "" : allStrings, c <- ['R', 'T', 'P']]

I've already tried this:

allStrings = [x | x <- [ c : s | s <- "" : allStrings, 
                  c <- ['R', 'T', 'P']], length x == 4] 

which didn't terminate.

Thanks for your help!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
T.Naz
  • 125
  • 8
  • `filter` will not result in an infinite loop, since it works in a lazy manner. It is only if you would evaluate the list entirely, that it of course will get stuck in an infinite loop. – Willem Van Onsem Oct 07 '19 at 14:13
  • allStrings = [ c : s | s <- "" : allStrings, c <- ['R', 'T', 'P']] that's my function that generates the list of strings. If I try the following code (which I assume is the same as filter) my ghci starts to hang. allStrings = [ x| x <-[ c : s | s <- "" : allStrings, c <- ['R', 'T', 'P']], length x == 4] – T.Naz Oct 07 '19 at 14:16
  • @WillemVanOnsem wouldn't `print . filter (>0) $ [0..]` "evaluate the list entirely"? – Will Ness Oct 07 '19 at 14:18
  • @T.Naz: that is because your `allStrings` will never emit something, because you say it should be length `4`, but in order to construct a list of length `4` here, you first need `allString`s to construct a list of length `3`, etc. – Willem Van Onsem Oct 07 '19 at 14:20
  • @WillNess: yes, since, as said before, `print` will entirely evaluate the list, and thus once you start `print`ing, you will never escape from that anymore so to speak. – Willem Van Onsem Oct 07 '19 at 14:20
  • @WillemVanOnsem I just think "evaluate the list entirely" can be confusing. I wouldn't refer to that process as "stuck". "stuck" implies *non-productive* loops, like `sum . filter (>0) $ [0..]`. – Will Ness Oct 07 '19 at 14:22
  • Well if I change the '==4' to '<4' , my ghci is still stuck. Could you recomend another way to extract the substrings of length 4? – T.Naz Oct 07 '19 at 14:27
  • The function fulfills its dut but it doesnt terminate – T.Naz Oct 07 '19 at 14:29

2 Answers2

5

This

allStrings4 = takeWhile ((== 4) . length) . 
                dropWhile ((< 4) . length) $ allStrings

does the trick.

It works because your (first) allStrings definition cleverly generates all strings containing 'R', 'T', and 'P' letters in productive manner, in the non-decreasing length order.

Instead of trying to cram it all into one definition, separate your concerns! Build a solution to the more general problem first (this is your allStrings definition), then use it to solve the more restricted problem. This will often be much simpler, especially with the lazy evaluation of Haskell.

We just need to take care that our streams are always productive, never stuck.

Will Ness
  • 70,110
  • 9
  • 98
  • 181
  • `allStrings` here refers to your *first* definition. *Never* use same name to refer to different things. Always number them: `allStrings`, `allStrings2`, etc., so there's no confusion. – Will Ness Oct 07 '19 at 14:45
  • 1
    "Never use same name to refer to different things" is a bit of a difficult guideline to understand. We typically use `xs` to refer to many different things throughout the course of a program. I don't know exactly what you meant, so I can't suggest a good clarification for you. – amalloy Oct 07 '19 at 21:45
  • I think a better guideline is to avoid name shadowing unless you are very sure you want to do it. The compiler will warn you about this if you enable `-Wall` or `-Wname-shadowing`. – David Fox Oct 07 '19 at 22:03
  • @amalloy I meant "in the same question", ultimately. or more broadly, during exploration "interactive play" at the GHCi prompt. – Will Ness Oct 07 '19 at 22:05
  • (I'm very surprised it was unclear BTW. *obviously* the remark was made in context, I thought, the OP asks about two different definitions with the same name, which is confusing, etc. -- all mentioned in that same comment) – Will Ness Oct 07 '19 at 22:19
  • @DavidFox nothing wrong with intentional shadowing in finished code, obviously, it's just that we lose easy access to the previously defined entity, and lose the ability to easily distinguish between them in speech, while still in the interactive developing / exploring stage. – Will Ness Oct 07 '19 at 22:33
  • @WillNess at various times in the past name shadowing has caused me trouble when one declaration of the symbol got removed or renamed, silently revealing the other one, thereby leading to mayhem. – David Fox Oct 08 '19 at 17:09
4

The problem is that your filter makes it impossible to generate any solutions. In order to generate a string of length 4, you first will need to generate a string of length 3, since you each time prepend one character to it. In order to generate a list of length 3, it thus will need to generate strings of length 2, and so on, until the base case: an empty string.

It is not the filter itself that is the main problem, the problem is that you filter in such a way that emitting values is now impossible.

We can fix this by using a different list that will build strings, and filter that list like:

allStrings = filter ((==) 4 . length) vals
    where vals = [x | x <- [ c : s | s <- "" : vals, c <- "RTP"]]

This will emit all lists of length 4, and then get stuck in an infinite loop, since filter will keep searching for more strings, and fail to find these.

We can however do better, for example by using replicateM :: Monad m => Int -> m a -> m [a] here:

Prelude Control.Monad> replicateM 4 "RTP"
["RRRR","RRRT","RRRP","RRTR","RRTT","RRTP","RRPR","RRPT","RRPP","RTRR","RTRT","RTRP","RTTR","RTTT","RTTP","RTPR","RTPT","RTPP","RPRR","RPRT","RPRP","RPTR","RPTT","RPTP","RPPR","RPPT","RPPP","TRRR","TRRT","TRRP","TRTR","TRTT","TRTP","TRPR","TRPT","TRPP","TTRR","TTRT","TTRP","TTTR","TTTT","TTTP","TTPR","TTPT","TTPP","TPRR","TPRT","TPRP","TPTR","TPTT","TPTP","TPPR","TPPT","TPPP","PRRR","PRRT","PRRP","PRTR","PRTT","PRTP","PRPR","PRPT","PRPP","PTRR","PTRT","PTRP","PTTR","PTTT","PTTP","PTPR","PTPT","PTPP","PPRR","PPRT","PPRP","PPTR","PPTT","PPTP","PPPR","PPPT","PPPP"]

Note that here the last character first changes when we generate the next string. I leave it as an exercise to obtain the reversed result.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555