4

I am processing XML using scala, and I am converting the XML into my own data structures. Currently, I am using plain Map instances to hold (sub-)elements, however, the order of elements from the XML gets lost this way, and I cannot reproduce the original XML.

Therefore, I want to use LinkedHashMap instances instead of Map, however I am using groupBy on the list of nodes, which creates a Map:

For example:

  def parse(n:Node): Unit = 
  {
    val leaves:Map[String, Seq[XmlItem]] =
      n.child
        .filter(node => { ... })
        .groupBy(_.label)
        .map((tuple:Tuple2[String, Seq[Node]]) =>
        {
          val items = tuple._2.map(node =>
          {
            val attributes = ...

            if (node.text.nonEmpty)
              XmlItem(Some(node.text), attributes)
            else
              XmlItem(None, attributes)
          })

          (tuple._1, items)
        })

      ...
   }

In this example, I want leaves to be of type LinkedHashMap to retain the order of n.child. How can I achieve this?

Note: I am grouping by label/tagname because elements can occur multiple times, and for each label/tagname, I keep a list of elements in my data structures.


Solution
As answered by @jwvh I am using foldLeft as a substitution for groupBy. Also, I decided to go with LinkedHashMap instead of ListMap.

  def parse(n:Node): Unit = 
  {
    val leaves:mutable.LinkedHashMap[String, Seq[XmlItem]] =
      n.child
        .filter(node => { ... })
        .foldLeft(mutable.LinkedHashMap.empty[String, Seq[Node]])((m, sn) =>
        {
          m.update(sn.label, m.getOrElse(sn.label, Seq.empty[Node]) ++ Seq(sn))
          m
        })
        .map((tuple:Tuple2[String, Seq[Node]]) =>
        {
          val items = tuple._2.map(node =>
          {
            val attributes = ...

            if (node.text.nonEmpty)
              XmlItem(Some(node.text), attributes)
            else
              XmlItem(None, attributes)
          })

          (tuple._1, items)
        })
user826955
  • 3,137
  • 2
  • 30
  • 71

2 Answers2

1

Purely immutable solution would be quite slow. So I'd go with

import collection.mutable.{ArrayBuffer, LinkedHashMap}

implicit class ExtraTraversableOps[A](seq: collection.TraversableOnce[A]) {
  def orderedGroupBy[B](f: A => B): collection.Map[B, collection.Seq[A]] = {
    val map = LinkedHashMap.empty[B, ArrayBuffer[A]]

    for (x <- seq) {
      val key = f(x)
      map.getOrElseUpdate(key, ArrayBuffer.empty) += x
    }

    map
}

To use, just change .groupBy in your code to .orderedGroupBy.

The returned Map can't be mutated using this type (though it can be cast to mutable.Map or to mutable.LinkedHashMap), so it's safe enough for most purposes (and you could create a ListMap from it at the end if really desired).

Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487
1

To get the rough equivalent to .groupBy() in a ListMap you could fold over your collection. The problem is that ListMap preserves the order of elements as they were appended, not as they were encountered.

import collection.immutable.ListMap

List('a','b','a','c').foldLeft(ListMap.empty[Char,Seq[Char]]){
  case (lm,c) => lm.updated(c, c +: lm.getOrElse(c, Seq()))
}
//res0: ListMap[Char,Seq[Char]] = ListMap(b -> Seq(b), a -> Seq(a, a), c -> Seq(c))

To fix this you can foldRight instead of foldLeft. The result is the original order of elements as encountered (scanning left to right) but in reverse.

List('a','b','a','c').foldRight(ListMap.empty[Char,Seq[Char]]){
  case (c,lm) => lm.updated(c, c +: lm.getOrElse(c, Seq()))
}
//res1: ListMap[Char,Seq[Char]] = ListMap(c -> Seq(c), b -> Seq(b), a -> Seq(a, a))

This isn't necessarily a bad thing since a ListMap is more efficient with last and init ops, O(1), than it is with head and tail ops, O(n).

To process the ListMap in the original left-to-right order you could .toList and .reverse it.

List('a','b','a','c').foldRight(ListMap.empty[Char,Seq[Char]]){
  case (c,lm) => lm.updated(c, c +: lm.getOrElse(c, Seq()))
}.toList.reverse
//res2: List[(Char, Seq[Char])] = List((a,Seq(a, a)), (b,Seq(b)), (c,Seq(c)))
jwvh
  • 50,871
  • 7
  • 38
  • 64
  • Thanks, I also came up with `foldLeft` after experimenting. Also, I switched to `LinkedHashMap`. My code now perfectly retains element order and I can re-create the original XML perfectly. I didn't have to switch to `foldRight` however as you said, it works with `foldLeft` for me. – user826955 Apr 02 '19 at 09:14