How can I implement loops that don't use potentially very large amounts of heapspace in ZIO

Question

I know that ZIO is maintains its own stack, namely zio.internal.FiberContext#stack, which protects recursive functions like

def getNameFromUser(askForName: UIO[String]): UIO[String] =
  for {
    resp <- askForName
    name <- if (resp.isEmpty) getNameFromUser(askForName) else ZIO.succeed(resp)
  } yield name

from stack overflows. However, they still consume space in the ZIO interpreter stack, which can result in OutOfMemoryError for very deep recursions. How would you rewrite the getNameFromUser function from above, to not blow the heap even when the askForName effect returns empty strings for a very long time?

toxicafunk · Answer 1 · 2019-12-05T23:07:40.640

You're using a loop within a recursive function. Basically, every time you call getNameFromUser you are allocating objects to the heap, the heap can never free those objects because you objects created on t1, need the objects created in t2 to resolve, but the objects from t2 need that the objects on t3 to resolve ad infinitum.

Instead of a loop you should use a ZIO combinator just as forever or any other you can find on Schedule

 import zio.Schedule

 val getNameFromUser: RIO[Console, String] = for {
  _    <- putStrLn("Waht is your name")
  name <- zio.console.getStrLn
 } yield name

 val runUntilNotEmpty = Schedule.doWhile[String](_.isEmpty)

 rt.unsafeRun(getNameFromUser.repeat(runUntilNotEmpty))

[EDIT] Adding a different example cuz all you actually need is:

import zio._
import zio.console._
import scala.io.StdIn

object ConsoleEx extends App {

  val getNameFromUser = for {
    _    <- putStrLn("What is your name?")
    name <- getStrLn
    _    <- putStr(s"Hello, $name")
  } yield ()

  override def run(args: List[String]) =
    getNameFromUser.fold(t => {println(t); 1}, _ => 0)

}

Note, however, that i you jhave fork in run := true in your build.sbt then you will also need to add run / connectInput := true as explained in the sbt docs

score 1 · Accepted Answer · answered Dec 15 '19 at 23:53

The recommended way to rewrite the function from above is to use an appropriate Schedule, as suggested by toxicafunk, resulting in

def getNameFromUserSchedule(askForName: UIO[String]): UIO[String] =
  askForName.repeat(Schedule.doWhile(_.isEmpty))

This is both concise and readable, and consumes only a constant amount of ZIO stack frames.

However, you don't have to use Schedule to make

def getNameFromUser(askForName: UIO[String]): UIO[String] =
  for {
    resp <- askForName
    name <- if (resp.isEmpty) getNameFromUser(askForName) else ZIO.succeed(resp)
  } yield name

consume a constant amount of ZIO stack frames. It could also be done like so:

def getNameFromUser(askForName: UIO[String]): UIO[String] =
  askForName.flatMap { resp =>
    if (resp.isEmpty) getNameFromUser(askForName) else ZIO.succeed(resp)
  }

This function looks almost like the original in its desugared form, which is

def getNameFromUser(askForName: UIO[String]): UIO[String] =
  askForName.flatMap { resp =>
    if (resp.isEmpty) getNameFromUser(askForName) else ZIO.succeed(resp)
  }.map(identity)

The only difference is the final map(identity). When interpreting a ZIO value generated from this function, the interpreter has to push the identity on the stack, compute the flatMap, and then apply the identity. However, to compute the flatMap, the same procedure might repeat, forcing the interpreter to push as many identities on the stack as we have loop iterations. This is kind of annoying, but the interpreter cannot know, that the functions it pushes on the stack are in fact identities. You can eliminate them without dropping the nice for syntax, by using the better-monadic-for compiler plugin, that is able to optimize away the final map(identity) when desugaring for comprehensions.

Without the map(identity), the interpreter will execute askForName, and then use the closure

resp =>
    if (resp.isEmpty) getNameFromUser(askForName) else ZIO.succeed(resp)

to obtain the next ZIO value for interpretation. This procedure might repeat an arbitrary number of times, but the size of the interpreter stack will remain unchanged.

Summarizing, here is a brief discussion about when the ZIO interpreter will use its internal stack:

When computing chained flatMaps, like io0.flatMap(f1).flatMap(f2).flatMap(f3). To evaluate an expression like this, the interpreter will push f3 on the stack, and look at io0.flatMap(f1).flatMap(f2). Then it will put f2 on the stack and look at io0.flatMap(f1). Finally f1 will be put on the stack, and io0 is evaluated (there is an optimization in the interpreter that might take a shortcut here, but that's not relevant for the discussion). After the evaluation of io0 to r0, f1 is popped from the stack, and applied to the result of r0, giving us a new ZIO value, io1 = f1(r0). Now io1 is evaluated to r1 and f2 is popped from the stack, to obtain the next ZIO value io2 = f2(r1). Finally, io2 is evaluated to r2, f3 popped from the stack to obtain io3 = f3(r2) and io3 is interpreted to r3, the final result of the expression. Thus, if you have an algorithm, that works by chaining together flatMaps, you should expect the maximum depth of the ZIO stack to be at least the length of your chain of flatMaps.
When computing chained folds, like io.foldM(h1, f1).foldM(h2, f2).foldM(h3, f3), or mixtures of chained folds and chained flatMaps. If there are no errors, folds behave like flatMaps, so the analysis regarding the ZIO stack is quite similar. You should expect the maximum depth of the ZIO stack to be at least the length of your chain.
When applying the above rule, keep in mind, that there are many combinators, that are directly or indirectly implemented on top of flatMap and foldCauseM:
- map, as, zip, zipWith,<*, *>, foldLeft, foreach are implemented on top of flatMap
- fold, foldM, catchSome, catchAll, mapError are implemented on top of foldCauseM

Last but not least: You should not worry too much about the size of ZIOs internal stack, unless

you are implementing an algorithm where the number of iterations might become arbitrary large for only moderately or even constantly sized input data
you are traversing very large data structures, that don't fit into memory
a user can influence the stack depth directly with very little effort (that means without sending you large amounts of data through the network for example)

How can I implement loops that don't use potentially very large amounts of heapspace in ZIO

2 Answers2