1

I've seen many questions about Scala collections and could not decide. This question was the most useful until now.

I think the core of the question is twofold: 1) Which are the best collections for this use case? 2) Which are the recommended ways to use them?

Details:

I am implementing an algorithm that iterates over all elements in a collection searching for the one that matches a certain criterion. After the search, the next step is to search again with a new criterion, but without the chosen element among the possibilities. The idea is to create a sequence with all original elements ordered by the criterion (which changes at every new selection). The original sequence doesn't really need to be ordered, but there can be duplicates (the algorithm will only pick one at a time). Example with a small sequence of Ints (just to simplify):

object Foo extends App {
  def f(already_selected: Seq[Int])(element: Int): Double =
  // something more complex happens here,
  // specially something take takes 'already_selected' into account
  math.sqrt(element)

  //call to the algorithm

  val (result, ti) = Tempo.time(recur(Seq.fill(9900)(Random.nextInt), Seq()))
  println("ti = " + ti)

  //algorithm
  def recur(collection: Seq[Int], already_selected: Seq[Int]): (Seq[Int], Seq[Int]) =
     if (collection.isEmpty) (Seq(), already_selected)
     else {
        val selected = collection maxBy f(already_selected)
        val rest = collection diff Seq(selected) //this part doesn't seem to be efficient
        recur(rest, selected +: already_selected)
     }
}

object Tempo {
 def time[T](f: => T): (T, Double) = {
    val s = System.currentTimeMillis
    (f, (System.currentTimeMillis - s) / 1000d)
 }
}
Community
  • 1
  • 1
  • as far as i understand recur does not terminate – Alex Dec 12 '13 at 14:37
  • full compilable code added to question –  Dec 12 '13 at 14:48
  • In my test, Vector and Array are 15% slower than List and Seq. –  Dec 12 '13 at 14:57
  • @davips Seq is not a specific class, it's default implementation is List. On the other hand IndexedSeq default implementation is Vector. Note also that there is a pile of things [which complicate benchmarking of JVM code](http://www.ibm.com/developerworks/java/library/j-benchmark1/). It does not seems that you consider them. – om-nom-nom Dec 12 '13 at 15:58
  • And yes, diffing to throw away just a single element is a likely bottleneck, not collections innerings. – om-nom-nom Dec 12 '13 at 16:04
  • 1
    I don't think the `diff` matters much here, since `maxBy` is an O(n) operation (maybe more). For the benefit of structural sharing when modifying the old `Seq`, you can consider `zipWithIndex` and this solution: http://stackoverflow.com/questions/12864505/how-can-i-idiomatically-remove-a-single-element-from-a-list-in-scala-and-close – lcn Dec 12 '13 at 17:45
  • @om-nom-nom I agree about guidelines for benchmark, but it is not the case since I am testing directly my use case. It is not my intent to generalize any conclusions from here to other problems. I will try with `patch` and `inline` to see the difference. –  Dec 12 '13 at 22:52
  • `patch` takes two times longer than `diff`. Changed lines: `val (selected, index) = collection.zipWithIndex maxBy {case (elem, idx) => f(already_selected)(elem)}` and `val rest = collection.patch(index, Nil, 1)`. –  Dec 12 '13 at 22:57
  • `@tailrec`and `@inline` make no difference. Scala 2.10.2. –  Dec 12 '13 at 23:26

1 Answers1

0

Try @inline and as icn suggested How can I idiomatically "remove" a single element from a list in Scala and close the gap?:

object Foo extends App {
  @inline
  def f(already_selected: Seq[Int])(element: Int): Double =
  // something more complex happens here,
  // specially something take takes 'already_selected' into account
    math.sqrt(element)

  //call to the algorithm

  val (result, ti) = Tempo.time(recur(Seq.fill(9900)(Random.nextInt()).zipWithIndex, Seq()))
  println("ti = " + ti)

  //algorithm
  @tailrec
  def recur(collection: Seq[(Int, Int)], already_selected: Seq[Int]): Seq[Int] =
    if (collection.isEmpty) already_selected
    else {
      val (selected, i) = collection.maxBy(x => f(already_selected)(x._2))
      val rest = collection.patch(i, Nil, 1) //this part doesn't seem to be efficient
      recur(rest, selected +: already_selected)
    }
}

object Tempo {
  def time[T](f: => T): (T, Double) = {
    val s = System.currentTimeMillis
    (f, (System.currentTimeMillis - s) / 1000d)
  }
}
Community
  • 1
  • 1
Alex
  • 8,518
  • 4
  • 28
  • 40
  • The result is almost two times slower than the original. I will not profile it, but instead take as an advice que keep things simple. –  Dec 12 '13 at 23:24
  • the original took more than 200ms and this one took 150ms on my machine. – Alex Dec 14 '13 at 15:12
  • Maybe it is the JDK, my Java version is "1.6.0_39". –  Dec 18 '13 at 16:47
  • java version "1.7.0_45" Java(TM) SE Runtime Environment (build 1.7.0_45-b18) Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode) – Alex Dec 20 '13 at 01:58