5

Apologies if this is a duplicate - I did a few searches and didn't quite find what I need.

We have a performance critical piece of our application that converts a Play 2.0 Enumerator (can be thought of as a Stream) of incoming data to a List (or similar). We will use the fold method on Enumerator and the question is what will be the most performant way to do it. (I will use Stream instead of Enumerator in the code, but the idea should be the same.)

val incoming: Stream[Int] = ???
val result: Seq[Int] = incoming.fold(Seq.empty)(_ + _)
val result2: Seq[Int] = incoming.fold(MutableList.empty(_ += _).toSeq

So the question is essentially, how does repeatedly appending to an immutable Vector compare to repeatedly appending to a mutable MutableList or ListBuffer in performance critical code? We've thrown out just List because we need O(1) appending (not prepending). But does the mutable data-structure buy us anything in terms of performance or garbage collection?

Arjan
  • 19,957
  • 2
  • 55
  • 48
dyross
  • 741
  • 5
  • 18
  • why not just use `toList`? – Dylan Dec 27 '12 at 17:39
  • Sorry if it wasn't clear - the application uses an Enumerator from Play 2.0 so it's a bit different. I thought using Stream in the example code would be simpler but it made this a little confusing I guess... How should I fix the question? – dyross Dec 27 '12 at 17:40
  • partly my fault - I failed to notice that you mentioned `Stream`. It should be enough to mention in the question that it is a Play 2.0 Enumerator – Dylan Dec 27 '12 at 17:43
  • Typo? - `val incoming: String[Int]` – Dylan Dec 27 '12 at 17:50
  • http://stackoverflow.com/questions/5446744/difference-between-mutablelist-and-listbuffer?rq=1 provides a pretty good description of how the two data structures work, and seems to pertain to your use case. – Dylan Dec 27 '12 at 17:58
  • I think we're gonna go with ListBuffer because of the final conversion to List. – dyross Dec 28 '12 at 00:32

1 Answers1

19

You are probably best off using ArrayBuffer. On my machine, you get about the following number of appends per second:

preallocated Array[Int]    -- 830M
resized (x2) Array[Int]    -- 263M
Vector.newBuilder + result -- 185M
mutable.ArrayBuffer        -- 125M
mutable.ListBuffer         -- 100M
mutable.MutableList        --  71M
immutable.List + reverse   --  68M
immutable.Vector           --   8M

I assume you're not always just storing ints, and you want all the collections goodness without extra wrappings, so ArrayBuffer is the best-performing solution as long as you only need to append to one end. The lists support bidirectional addition and are comparable. Vector is horribly slow in comparison--only use it if you can take advantage of a lot of data sharing, or create it all in one swoop (see Vector.newBuilder result, which is fantastic; it's a great data structure for access, iteration, and creation and sparing updates, not updates-all-the-time).

Rex Kerr
  • 166,841
  • 26
  • 322
  • 407
  • I'm surprised about the difference between 'resized (x2) Array[Int]' and `ArrayBuffer` because the latter is based on `ResizableArray` which uses the same resize strategy. Do you know what causes the slow-down ? – paradigmatic Dec 27 '12 at 20:02
  • 1
    @paradigmatic - Mostly boxing of `Int`. – Rex Kerr Dec 27 '12 at 21:05
  • The use of `Int` in my example is oversimplification. Also, we do need to convert it to a `List` or at least some `Seq` at the end. `Vector.newBuilder` or `ListBuffer` sound the best. – dyross Dec 28 '12 at 20:51
  • @RexKerr may I ask how you're creating the perf test output? Custom code or public? – BAR Feb 25 '15 at 01:18
  • @BAR - I don't remember any more what I did. I do this sort of thing _all_ the time. I probably used my own benchmarking tool, Thyme, but I might have used Caliper. – Rex Kerr Feb 25 '15 at 08:42
  • @RexKerr Thanks Rex, I implemented my own. – BAR Jun 27 '15 at 10:38
  • What does `Vector.newBuilder` mean? – WestCoastProjects Mar 08 '21 at 20:19
  • @StephenBoesch - Builders build a particular type of collection. So it means something like `val b = Vector.newBuilder[Int]; ...(code)... b += x ...(more code)...; b.result` to produce a `Vector[Int]`. (If you want to stick in a whole collection of things, you can do `b ++= xs` – Rex Kerr Mar 13 '21 at 12:34
  • OK different question: the `Vector.newBuilder+result` outperforms anything but preallocated `Array` so why do you recommend `ArrayBuffer` ? – WestCoastProjects Mar 13 '21 at 14:49
  • @StephenBoesch - Because a builder isn't a collection, so you can't do anything useful with it until you're completely done building. If you don't want to use the collection early (i.e. first you build, then you use), a builder is preferable. – Rex Kerr Mar 14 '21 at 01:19