33

I have a sequence of key-value pairs (String, Int), and I want to group them by key into a sequence of values (i.e. Seq[(String, Int)]) => Map[String, Iterable[Int]])).

Obviously, toMap isn't useful here, and groupBy maintains the values as tuples. The best I managed to come up with is:

val seq: Seq[( String, Int )]
// ...
seq.groupBy( _._1 ).mapValues( _.map( _._2 ) )

Is there a cleaner way of doing this?

Jean-Philippe Pellet
  • 59,296
  • 21
  • 173
  • 234
Tomer Gabel
  • 4,104
  • 1
  • 33
  • 37

4 Answers4

19

Here's a pimp that adds a toMultiMap method to traversables. Would it solve your problem?

import collection._
import mutable.Builder
import generic.CanBuildFrom

class TraversableOnceExt[CC, A](coll: CC, asTraversable: CC => TraversableOnce[A]) {

  def toMultiMap[T, U, That](implicit ev: A <:< (T, U), cbf: CanBuildFrom[CC, U, That]): immutable.Map[T, That] =
    toMultiMapBy(ev)

  def toMultiMapBy[T, U, That](f: A => (T, U))(implicit cbf: CanBuildFrom[CC, U, That]): immutable.Map[T, That] = {
    val mutMap = mutable.Map.empty[T, mutable.Builder[U, That]]
    for (x <- asTraversable(coll)) {
      val (key, value) = f(x)
      val builder = mutMap.getOrElseUpdate(key, cbf(coll))
      builder += value
    }
    val mapBuilder = immutable.Map.newBuilder[T, That]
    for ((k, v) <- mutMap)
      mapBuilder += ((k, v.result))
    mapBuilder.result
  }
}

implicit def commomExtendTraversable[A, C[A] <: TraversableOnce[A]](coll: C[A]): TraversableOnceExt[C[A], A] =
  new TraversableOnceExt[C[A], A](coll, identity)

Which can be used like this:

val map = List(1 -> 'a', 1 -> 'à', 2 -> 'b').toMultiMap
println(map)  // Map(1 -> List(a, à), 2 -> List(b))

val byFirstLetter = Set("abc", "aeiou", "cdef").toMultiMapBy(elem => (elem.head, elem))
println(byFirstLetter) // Map(c -> Set(cdef), a -> Set(abc, aeiou))

If you add the following implicit defs, it will also work with collection-like objects such as Strings and Arrays:

implicit def commomExtendStringTraversable(string: String): TraversableOnceExt[String, Char] =
  new TraversableOnceExt[String, Char](string, implicitly)

implicit def commomExtendArrayTraversable[A](array: Array[A]): TraversableOnceExt[Array[A], A] =
  new TraversableOnceExt[Array[A], A](array, implicitly)

Then:

val withArrays = Array(1 -> 'a', 1 -> 'à', 2 -> 'b').toMultiMap
println(withArrays) // Map(1 -> [C@377653ae, 2 -> [C@396fe0f4)

val byLowercaseCode = "Mama".toMultiMapBy(c => (c.toLower.toInt, c))
println(byLowercaseCode) // Map(97 -> aa, 109 -> Mm)
Jean-Philippe Pellet
  • 59,296
  • 21
  • 173
  • 234
  • Much more than I bargained for, but useful nonetheless. Thanks! – Tomer Gabel May 29 '12 at 09:57
  • This is very nice. Is there a simple way to override the type of the values collection (like I have a `List(1 -> 'a', 1 -> 'à', 2 -> 'b')`, but I want the result of `toMultiMap` to be `Map[Int, Set[String]]` ? Some trick with `breakOut` perhaps? – Tomáš Dvořák Nov 11 '16 at 09:48
12

There's no method or data structure in the standard library to do this, and your solution looks about as concise as you'll get. If you use this in more than one place, you might like to factor it out into a utility method

def groupTuples[A, B](seq: Seq[(A, B)]) = 
  seq groupBy (_._1) mapValues (_ map (_._2))

which you then obviously just call with groupTuples(seq). This might not be the most efficient possible in terms of CPU clock cycles, but I don't think it's particularly inefficient either.

I did a rough benchmark against Jean-Philippe's solution on a list of 9 tuples and this is marginally faster. Both were about twice as fast as folding the sequence into a map (effectively re-implementing groupBy to give the output you want).

Andrii Abramov
  • 10,019
  • 9
  • 74
  • 96
Luigi Plinge
  • 50,650
  • 20
  • 113
  • 180
  • `mapValues` actually just wraps the built map, so it may become less efficient when looking things up in the map. Besides, I’ve edited my answer to avoid a `toMap`; would you mind (out of curiosity) running the same benchmark again? With 9 tuples, building the map and two lookups is about a third quicker with my proposal according to my benchmarks. – Jean-Philippe Pellet May 28 '12 at 21:27
  • @Jean-Philippe I'm getting 97ms for 100k runs for the above, and 106ms with your updated one. Of course, we should really try different list lengths and compositions, but I just wanted to get a ballpark idea. In practical terms they're the same speed. – Luigi Plinge May 28 '12 at 23:11
  • @Jean-Philippe Interesting about `mapValues` wrapping the existing map - I didn't know that. Creating a whole new map with `seq groupBy (_._1) map (x => (x._1, x._2 map (_._2)))` takes 165 ms, so for creating a new map in memory yours is faster. – Luigi Plinge May 28 '12 at 23:26
  • Efficiency actually wasn't a huge concern, but it's nice to know the effect regardless. It's a shame I can't accept two answers - I'll have to go with Jean-Philippe's answer on the simple grounds of it being so comprehensive :-) – Tomer Gabel May 29 '12 at 09:57
8

I don't know if you consider it cleaner:

seq.groupBy(_._1).map { case (k,v) => (k,v.map(_._2))}
Johnny Everson
  • 8,343
  • 7
  • 39
  • 75
3

Starting Scala 2.13, most collections are provided with the groupMap method which is (as its name suggests) an equivalent (more efficient) of a groupBy followed by mapValues:

List(1 -> 'a', 1 -> 'b', 2 -> 'c').groupMap(_._1)(_._2)
// Map[Int,List[Char]] = Map(2 -> List(c), 1 -> List(a, b))

This:

  • groups elements based on the first part of tuples (Map(2 -> List((2,c)), 1 -> List((1,a), (1,b))))

  • maps grouped values (List((1,a), (1,b))) by taking their second tuple part (List(a, b)).

Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190