Scala streams are lazy meaning that they compute values on demand and memoize them. This is problematic if the stream one deals with is very large (possibly infinite) and does not fit in memory.
What I want to do is to have a method that forwards part of the string without keeping the reference, something like:
def fun(stream: Stream[Int]) = {
val x = doSomethingWithPrefix(stream.take(10).toList)
val y = doSomethingWithRestOfStream(stream.drop(10))
computeResult(x,y)
}
But this can produce OOM:
scala> def ones = Stream.continually(1)
ones: scala.collection.immutable.Stream[Int]
scala> def f1(stream: Stream[Int]) = {
| stream.take(10).toList
| stream.drop(10).length
| }
f1: (stream: Stream[Int])Int
scala> println(f1(ones.take(100000000)))
java.lang.OutOfMemoryError: GC overhead limit exceeded
// ... stacktrace ...
The solution suggested in many places (e.g. here, pt. 3) is to use scala's pass-by-name, which creates a parameterless function which can be evaluated to get the actual parameter. But this solution is also not good here because then the function would get evaluated twice:
scala> def f2(stream: => Stream[Int]) = {
| stream.take(10).toList
| stream.drop(10).length
| }
f2: (stream: => Stream[Int])Int
scala> def makeOnes = {
| println("ha")
| ones.take(100000000)
| }
makeOnes: scala.collection.immutable.Stream[Int]
scala> println(f2(makeOnes))
ha
ha
99999990
The only workaround that I have right now is to manually inline the doSomethingWithRestOfStream
function, for example this works:
scala> def f3(stream: => Stream[Int]) = {
| var str: Stream[Int] = stream
| str.take(10).toList
| str = str.drop(10)
| var len: Int = 0
| while (!str.isEmpty) {
| str = str.tail
| len = len+1
| }
| len
| }
f3: (stream: => Stream[Int])Int
scala> println(f3(makeOnes))
ha
99999990
Is there any better solution?
Also if scala had rebindable function parameters (which it does not) it would be possible to simplify this workaround by taking normal Stream[Int]
instead of => Stream[Int]
and using stream
in place of str
.