3

I'd like to write a merge method that takes two iterables and merges them together. (maybe merge is not the best word to describe what I want, but for the sake of this question it's irrelevant). I'd like this method be generic to work with different concrete iterables.

For example, merge(Set(1,2), Set(2,3)) should return Set(1,2,3) and merge(List(1,2), List(2,3)) should return List(1, 2, 2, 3). I've done the following naive attempt, but the compiler is complaining about the type of res: It is Iterable[Any] instead of A.

def merge[A <: Iterable[_]](first: A, second: A): A = {
    val res = first ++ second
    res
}

How can I fix this compile error? (I'm more interested in understanding how to implement such a functionality, rather than a library that does it for me, so explanation of why my code does not work is very appreciated.)

Wickoo
  • 6,745
  • 5
  • 32
  • 45

2 Answers2

5

Let's start off with why your code didn't work. First off you're accidentally using the abbreviated syntax for an existential type, rather than actually using a type bound on a higher kinded type.

// What you wrote is equivalent to this
def merge[A <: Iterable[T] forSome {type T}](first: A, second: A): A

Even fixing it though doesn't quite get you what you want.

def merge[A, S[T] <: Iterable[T]](first: S[A], second: S[A]): S[A] = {
  first ++ second // CanBuildFrom errors :(
}

This is because ++ doesn't use type bounds to achieve its polymorphism, it uses an implicit CanBuildFrom[From, Elem, To]. CanBuildFrom is responsible for giving an appropriate Builder[Elem, To], which is a mutable buffer which we use to build up the collection of our desired type.

So that means we're going to have to give it the CanBuildFrom it so desires and everything'll work right?

import collection.generic.CanBuildFrom

// Cannot construct a collection of type S[A] with elements of type A 
// based on a collection of type Iterable[A]
merge0[A, S[T] <: Iterable[T], That](x: S[A], y: S[A])
  (implicit bf: CanBuildFrom[S[A], A, S[A]]): S[A] = x.++[A, S[A]](y)

Nope :(.

I've added the extra type annotations to ++ to make the compiler error more relevant. What this is telling us is that because we haven't specifically overridden Iterable's ++ with our own for our arbitrary S, we're using Iterable's implementation of it, which just so happens to take an implicit CanBuildFrom that builds from Iterable's to our S.

This is incidentally the problem @ChrisMartin was running into (and this whole thing really is a long-winded comment to his answer).

Unfortunately Scala does not offer such a CanBuildFrom, so it looks like we're gonna have to use CanBuildFrom manually.

So down the rabbit hole we go...

Let's start off by noticing that ++ is in fact actually defined originally in TraversableLike and so we can make our custom merge a bit more general.

def merge[A, S[T] <: TraversableLike[T, S[T]], That](it: S[A], that: TraversableOnce[A])
  (implicit bf: CanBuildFrom[S[A], A, That]): That = ???

Now let's actually implement that signature.

 import collection.mutable.Builder

 def merge[A, S[T] <: TraversableLike[T, S[T]], That](it: S[A], that: TraversableOnce[A])
  (implicit bf: CanBuildFrom[S[A], A, That]): That= {
    // Getting our mutable buffer from CanBuildFrom
    val builder: Builder[A, That] = bf()
    builder ++= it
    builder ++= that
    builder.result()
  }

Note that I've changed GenTraversableOnce[B]* to TraversableOnce[B]**. This is because the only way to make Builder's ++= work is to have sequential access***. And that's there is all to CanBuildFrom. It gives you a mutable buffer that you fill with all values you want, then you convert the buffer into whatever your desired output collection is with result.

scala> merge(List(1, 2, 3), List(2, 3, 4))
res0: List[Int] = List(1, 2, 3, 2, 3, 4)

scala> merge(Set(1, 2, 3), Set(2, 3, 4))
res1: scala.collection.immutable.Set[Int] = Set(1, 2, 3, 4)

scala> merge(List(1, 2, 3), Set(1, 2, 3))
res2: List[Int] = List(1, 2, 3, 1, 2, 3)

scala> merge(Set(1, 2, 3), List(1, 2, 3)) // Not the same behavior :(
res3: scala.collection.immutable.Set[Int] = Set(1, 2, 3)

In short, the CanBuildFrom machinery lets you build code that deals with the fact that we often wish to automatically convert between different branches of the inheritance graph of Scala's collections, but it comes at the cost of some complexity and occasionally unintuitive behavior. Weigh the tradeoffs accordingly.

Footnotes:

* "Generalized" collections for which we can "Traverse" at least "Once", but maybe not more, in some order which may or may not be sequential, e.g. perhaps parallel.

** Same thing as GenTraversableOnce except not "General" because it guarantees sequential access.

*** TraversableLike gets around this by forcibly calling seq on the GenTraversableOnce internally, but I feel like that's cheating people out of parallelism when they might have otherwise expected it. Force callers to decide whether they want to give up their parallelism; don't do it invisibly for them.

badcook
  • 3,699
  • 14
  • 25
  • Thanks for the extensive answer. Just one thing: `TraversableLike` takes two type parameters: `trait TraversableLike[+A, +Repr]`, and I had to define it as `S[A] <: TraversableLike[A, S[A]]`. – Wickoo Mar 09 '16 at 15:49
  • That's what I get for making changes on the fly and not verifying they actually compile. I'll correct it and thanks! – badcook Mar 10 '16 at 04:57
0

Preliminarily, here are the imports needed for all of the code in this answer:

import collection.GenTraversableOnce
import collection.generic.CanBuildFrom

Start by looking at the API doc to see the method signature for Iterable.++ (Note that the API docs for most collections are wrong, and you need to click "Full Signature" to see the real type):

def ++[B >: A, That](that: GenTraversableOnce[B])
  (implicit bf: CanBuildFrom[Iterable[A], B, That]): That

From there you can just do a straightforward translation from an instance method to a function:

def merge[A, B >: A, That](it: Iterable[A], that: GenTraversableOnce[B])
  (implicit bf: CanBuildFrom[Iterable[A], B, That]): That = it ++ that

Breaking this down:

  • [A, B >: A, That]Iterable has one type parameter A, and ++ has two type parameters B and That, so the resulting function has all three type parameters A, B, and That
  • it: Iterable[A] — The method belongs to Iterable[A], so we made that the first value parameter
  • that: GenTraversableOnce[B])(implicit bf: CanBuildFrom[Iterable[A], B, That]): That — the remaining parameter and type constraint copied directly from the signature of ++
Chris Martin
  • 30,334
  • 10
  • 78
  • 137