0

I'm trying to traverse/sequence a large stream (e.g. scala.collection.immutable.Stream) using Scalaz' (version 7.1.2) Traverse typeclass, but I'm constantly running into a java.lang.OutOfMemoryError: GC overhead limit exceeded issue.

My traversal looks roughly like the following:

import scalaz._
import Scalaz._
import scala.collection.immutable.Stream

Stream.range(1, 1000000000).traverse[MTrans, Int](f)

where MTrans is a monad transformer stack including EitherT and StateT and f: Int => MTrans[Int].

I'm generally just interested to sequence the elements (passing on state) and only need the final result (MTrans[Int]), not the whole (materialized) sequence/stream.

I have versions running with traverseKTrampoline but that doesn't help since this is not a StackOverflowError issue as described in other similar posts. I also tried combinations of using EphemeralStream and sequence but had no success.

How can I (memory-)efficiently traverse/sequence such a stream?

Update 1

The following is a more complete example of what I'm trying to do. It closely resembles the structure that I have and exhibits the same problem (GC overhead limit exceeds at some point).

object Main {

  import scalaz._
  import Scalaz._
  import Kleisli._

  import scala.collection.immutable.Stream

  import scala.language.higherKinds

  case class IState(s: Int)

  type IStateT[A] = StateT[Id, IState, A]
  type MTransT[S[_], A] = EitherT[S, String, A]
  type MTrans[A] = MTransT[IStateT, A]

  def eval(k: Int): MTrans[Int] = {
    for {
      state <- get[IState].liftM[MTransT]
      _ <- put(state.copy(s = state.s % k)).liftM[MTransT]
    } yield (k + 1)
  }

  def process(i: Int, k: Int): MTrans[Int] = {
    for {
      state <- get[IState].liftM[MTransT]
      _ <- put(state.copy(s = state.s + i)).liftM[MTransT]
      res <- eval(k)
    } yield res
  }

  def run() = {
    val m = Stream
      .range(1, 1000000000)
      .traverseKTrampoline[MTrans, Int, Int](i => kleisli(process(i, _))).run(7)

    m.run(IState(0))
  }
}

Update 2

Based on some input from Eric and from Applicative vs. monadic combinators and the free monad in Scalaz I came up with the following simple foldLeft-based solution using the applicative *>:

val m = Stream
  .range(1, 1000000000)
  .toEphemeralStream
  .foldLeft(0.point[MTrans]) { acc => i =>
    acc *> process(i, 3)
}

While this (still) seems to be stack safe, it requires large amounts of heap space and runs really slow.

Community
  • 1
  • 1
Martin Studer
  • 2,213
  • 1
  • 18
  • 23
  • Is the issue with StateT maybe which accumulates states until it can "fold" it with an initial value? I wonder if you could transform that to simple foldLeft. Can you post the code for `f`? – Eric May 22 '15 at 23:46
  • Just came across http://stackoverflow.com/questions/20756436/monadic-fold-with-state-monad-in-constant-space-heap-and-stack which I think is very much related – Martin Studer May 23 '15 at 09:31
  • I'm currently working on a "monadic fold" library which I think will help but it's not ready yet. You can follow me on twitter (@etorreborre) to get the announcement. – Eric May 23 '15 at 10:05

0 Answers0