7

I'd like to know if there a elegant way to achieve something like that:

val l = Stream.from(1)

val parts = l.some_function(3)  //any number

parts.foreach( println(_) )

> 1,4,7,10... 
> 2,5,8,11...
> 3,6,9,12...

Actually I need such operation on Streams for parallelization - to split the data across multiple actors without loading the whole stuff into memory.

Mikhail Golubtsov
  • 6,285
  • 3
  • 29
  • 36

6 Answers6

5

The answer from Split a scala list into n interleaving lists fully meets the conditions, a little bit modified to suit Streams:

def round[A](seq: Iterable[A], n: Int) = {
  (0 until n).map(i => seq.drop(i).sliding(1, n).flatten)
}
round(Stream.from(1),3).foreach(i => println(i.take(3).toList))
List(1, 4, 7)
List(2, 5, 8)
List(3, 6, 9)
Community
  • 1
  • 1
Mikhail Golubtsov
  • 6,285
  • 3
  • 29
  • 36
2

The only thing I can think of:

def distribute[T](n: Int)(x: Stream[T]) = (0 until n).map { p =>
  x.zipWithIndex.collect {
    case (e,i) if i % n == p => e
  }
}

It's kind of ugly because each of the sub-streams has to entirely traverse the main-stream. But I don't think you can mitigate that while preserving (apparent) immutability.

Have you thought of dispatching individual tasks to actors and having a "task-distributer" that does exactly this?

gzm0
  • 14,752
  • 1
  • 36
  • 64
  • Yes, I thought of that. I have to merge results from actors and the problem is that the intermediate results consume a lot of memory too and I want that there were a few actors and the same count of tasks/results. But nevertheless I could retrofit actors to reuse results from previous tasks and I'll follow this way if there is no simple way to split a stream. – Mikhail Golubtsov Jun 14 '13 at 19:34
2

A simple approach involves generating an arithmetic sequence for the indices you want and then mapping that to the stream. The apply method will pull out the corresponding values:

def f[A]( s:Stream[A], n:Int ) =
  0 until n map ( i => Iterator.iterate(0)(_+n) map ( s drop i ) )

f( Stream from 1, 3 ) map ( _ take 4 mkString "," )
// Vector(1,4,7,10, 2,5,8,11, 3,6,9,12)

A more performant solution would employ an iterator whose next method simply returns the value from the stream at the next index in the arithmetic sequence:

def comb[A]( s:Stream[A], first:Int, step:Int ):Iterator[A] = new Iterator {
  var i       = first - step
  def hasNext = true
  def next    = { i += step; s(i) }
}
def g[A]( s:Stream[A], n:Int ) =
  0 until n map ( i => comb(s,i,n) )

g( Stream from 1, 3 ) map ( _ take 4 mkString "," )
// Vector(1,4,7,10, 2,5,8,11, 3,6,9,12)

You mentioned that this was for actors, though -- if this is Akka, perhaps you could use a round-robin router.

UPDATE: The above (apparently incorrectly) assumes that there could be more work to do as long as the program is running, so hasNext always returns true; see Mikhail's answer for a version that works with finite streams as well.

UPDATE: Mikhail has determined that this answer to a prior StackOverflow question actually has an answer that works for finite and infinite Streams (although it doesn't look like it would perform nearly as well as an iterator).

Community
  • 1
  • 1
AmigoNico
  • 6,652
  • 1
  • 35
  • 45
  • Creating iterators looks good. The only thing that in your implementation hasNext always returns true - it only treats infinite collections, for common case the code will be more complex. I used actors from the standard Scala library, but it seems that Akka is worth to learn it, thanks. – Mikhail Golubtsov Jun 16 '13 at 10:09
  • See also http://stackoverflow.com/questions/11132788/split-a-scala-list-into-n-interleaving-lists?lq=1 for the finite case. – AmigoNico Jun 16 '13 at 17:39
  • Ouch! The "sliding" function with step did the trick. It suits for streams too. So writing of a custom iterator could be avoided. – Mikhail Golubtsov Jun 16 '13 at 20:15
0
scala> (1 to 30 grouped 3).toList.transpose foreach println
List(1, 4, 7, 10, 13, 16, 19, 22, 25, 28)
List(2, 5, 8, 11, 14, 17, 20, 23, 26, 29)
List(3, 6, 9, 12, 15, 18, 21, 24, 27, 30)
kiritsuku
  • 52,967
  • 18
  • 114
  • 136
  • 1
    `Stream.from(1).grouped(3).toStream.transpose foreach println` hangs in an infinite loop... – gzm0 Jun 14 '13 at 19:27
  • Uhh, that hurts. After looking into the implementation I saw that `transpose` checks if all collections have the same size. While that is a valid check, it hurts the lazy nature of `Streams`. Thanks for pointing that out, I will bring that to the mailing lists... – kiritsuku Jun 14 '13 at 19:40
  • I just asked a question about this. http://stackoverflow.com/questions/17116061/transpose-on-infinite-stream-loops-forever This seems to be a limitation of the builder abstraction. The problem is basically line 170 in `GenericTraversableTemplate.scala` https://github.com/scala/scala/blob/v2.10.2/src/library/scala/collection/generic/GenericTraversableTemplate.scala#L1 – gzm0 Jun 14 '13 at 19:42
  • SO is not the right place to check if something is a bug or not nor are here the right people to decide that. I'll keep that in mind, if no good answer appears I'll bring that to the mailing lists. – kiritsuku Jun 14 '13 at 19:49
0

I didn't find any such function in Scala library, so I retrofited the iterator variant of AmigoNico's answer. The code treats both finite and infinite collections.

  def splitRoundRobin[A](s: Iterable[A], n: Int) = {
    def comb[A](s: Iterable[A], first: Int, step: Int): Iterator[A] = new Iterator[A] {
      val iter = s.iterator
      var nextElem: Option[A] = iterToNext(first)
      def iterToNext(elemsToSkip: Int) = {
        iterToNextRec(None, elemsToSkip)
      }
      def iterToNextRec(next: Option[A], repeat: Int): Option[A] = repeat match {
        case 0 => next
        case _ => if (iter.hasNext) iterToNextRec(Some(iter.next()), repeat - 1) else None
      }
      def hasNext = nextElem.isDefined || {
        nextElem = iterToNext(step)
        nextElem.isDefined
      }
      def next = {
        var result = if (nextElem.isDefined) nextElem.get else throw new IllegalStateException("No next")
        nextElem = None
        result
      }
    }
    0 until n map (i => comb(s, i, n))
  }  

  splitRoundRobin(1 to 12 toStream, 3) map (_.toList.mkString(","))
 // Vector(3,6,9,12, 1,4,7,10, 2,5,8,11)

  splitRoundRobin(Stream from 1, 3) map (_.take(4).mkString(","))
//> Vector(3,6,9,12, 1,4,7,10, 2,5,8,11)
Mikhail Golubtsov
  • 6,285
  • 3
  • 29
  • 36
0
def roundRobin[T](n: Int, xs: Stream[T]) = {
  val groups = xs.grouped(n).map(_.toIndexedSeq).toStream
  (0 until n).map(i => groups.flatMap(_.lift(i)))
}

works in the infinite case:

scala> roundRobin(3, Stream.from(0)).map(_.take(3).force.mkString).mkString(" ")
res6: String = 036 147 258

using flatMap/lift instead of plain map/apply means it works even if the input is finite and the length isn't a multiple of n:

scala> roundRobin(3, Stream.from(0).take(10)).map(_.mkString).mkString(" ")
res5: String = 0369 147 258
Seth Tisue
  • 29,985
  • 11
  • 82
  • 149