Memory efficient stream traversal using the scalaz Traverse typeclass

Question

I'm trying to traverse/sequence a large stream (e.g. scala.collection.immutable.Stream) using Scalaz' (version 7.1.2) Traverse typeclass, but I'm constantly running into a java.lang.OutOfMemoryError: GC overhead limit exceeded issue.

My traversal looks roughly like the following:

import scalaz._
import Scalaz._
import scala.collection.immutable.Stream

Stream.range(1, 1000000000).traverse[MTrans, Int](f)

where MTrans is a monad transformer stack including EitherT and StateT and f: Int => MTrans[Int].

I'm generally just interested to sequence the elements (passing on state) and only need the final result (MTrans[Int]), not the whole (materialized) sequence/stream.

I have versions running with traverseKTrampoline but that doesn't help since this is not a StackOverflowError issue as described in other similar posts. I also tried combinations of using EphemeralStream and sequence but had no success.

How can I (memory-)efficiently traverse/sequence such a stream?

Update 1

The following is a more complete example of what I'm trying to do. It closely resembles the structure that I have and exhibits the same problem (GC overhead limit exceeds at some point).

object Main {

  import scalaz._
  import Scalaz._
  import Kleisli._

  import scala.collection.immutable.Stream

  import scala.language.higherKinds

  case class IState(s: Int)

  type IStateT[A] = StateT[Id, IState, A]
  type MTransT[S[_], A] = EitherT[S, String, A]
  type MTrans[A] = MTransT[IStateT, A]

  def eval(k: Int): MTrans[Int] = {
    for {
      state <- get[IState].liftM[MTransT]
      _ <- put(state.copy(s = state.s % k)).liftM[MTransT]
    } yield (k + 1)
  }

  def process(i: Int, k: Int): MTrans[Int] = {
    for {
      state <- get[IState].liftM[MTransT]
      _ <- put(state.copy(s = state.s + i)).liftM[MTransT]
      res <- eval(k)
    } yield res
  }

  def run() = {
    val m = Stream
      .range(1, 1000000000)
      .traverseKTrampoline[MTrans, Int, Int](i => kleisli(process(i, _))).run(7)

    m.run(IState(0))
  }
}

Update 2

Based on some input from Eric and from Applicative vs. monadic combinators and the free monad in Scalaz I came up with the following simple foldLeft-based solution using the applicative *>:

val m = Stream
  .range(1, 1000000000)
  .toEphemeralStream
  .foldLeft(0.point[MTrans]) { acc => i =>
    acc *> process(i, 3)
}

While this (still) seems to be stack safe, it requires large amounts of heap space and runs really slow.

Is the issue with StateT maybe which accumulates states until it can "fold" it with an initial value? I wonder if you could transform that to simple foldLeft. Can you post the code for `f`? — Eric, May 22 '15 at 23:46
Just came across http://stackoverflow.com/questions/20756436/monadic-fold-with-state-monad-in-constant-space-heap-and-stack which I think is very much related — Martin Studer, May 23 '15 at 09:31
I'm currently working on a "monadic fold" library which I think will help but it's not ready yet. You can follow me on twitter (@etorreborre) to get the announcement. — Eric, May 23 '15 at 10:05

Memory efficient stream traversal using the scalaz Traverse typeclass

0 Answers0