1

Scala streams are lazy meaning that they compute values on demand and memoize them. This is problematic if the stream one deals with is very large (possibly infinite) and does not fit in memory.

What I want to do is to have a method that forwards part of the string without keeping the reference, something like:

def fun(stream: Stream[Int]) = {
  val x = doSomethingWithPrefix(stream.take(10).toList)
  val y = doSomethingWithRestOfStream(stream.drop(10))
  computeResult(x,y)
}

But this can produce OOM:

scala> def ones = Stream.continually(1)
ones: scala.collection.immutable.Stream[Int]

scala> def f1(stream: Stream[Int]) = {
     |   stream.take(10).toList
     |   stream.drop(10).length
     | }
f1: (stream: Stream[Int])Int

scala> println(f1(ones.take(100000000)))
java.lang.OutOfMemoryError: GC overhead limit exceeded
// ... stacktrace ...

The solution suggested in many places (e.g. here, pt. 3) is to use scala's pass-by-name, which creates a parameterless function which can be evaluated to get the actual parameter. But this solution is also not good here because then the function would get evaluated twice:

scala> def f2(stream: => Stream[Int]) = {
     |   stream.take(10).toList
     |   stream.drop(10).length
     | }
f2: (stream: => Stream[Int])Int

scala> def makeOnes = {
     |   println("ha")
     |   ones.take(100000000)
     | }
makeOnes: scala.collection.immutable.Stream[Int]

scala> println(f2(makeOnes))
ha
ha
99999990

The only workaround that I have right now is to manually inline the doSomethingWithRestOfStream function, for example this works:

scala> def f3(stream: => Stream[Int]) = {
     |   var str: Stream[Int] = stream
     |   str.take(10).toList
     |   str = str.drop(10)
     |   var len: Int = 0
     |   while (!str.isEmpty) {
     |     str = str.tail
     |     len = len+1
     |   }
     |   len
     | }
f3: (stream: => Stream[Int])Int

scala> println(f3(makeOnes))
ha
99999990

Is there any better solution?

Also if scala had rebindable function parameters (which it does not) it would be possible to simplify this workaround by taking normal Stream[Int] instead of => Stream[Int] and using stream in place of str.

Community
  • 1
  • 1
matix2267
  • 620
  • 4
  • 11
  • Perhaps this question is related - http://stackoverflow.com/questions/20360161/streamfilter-runs-out-of-memory-for-1-000-000-items – Kevin Meredith May 07 '15 at 15:17
  • What about `Iterator`s? (Though they can be visited only once. Because of that -as it seems you need multiple visits- you might need `Iterable`s.) – Gábor Bakos May 07 '15 at 15:23

0 Answers0