3

I have an Scala question. Imagine you're building code to handle different operations, i.e.

operation match {
   case A => doA()
   case B => doB()
   case _ => throw new Exception("Unknown op: " + operation)
}

Now, imagine that later on you wanna build a new version and you want to extend that pattern matching for operation C. How can you do it in such way that operation resolution is still O(1)?

I mean, I could modify the code above to do:

   case _ => handleUnknownOperation(operation)

And a subclass could implement handleUnknownOperation to do:

operation match {
   case C => doC()
}

But this is sucky cos it means that C operation requires O(2).

Any other ideas, or best practices for extending this kind of pattern matching structures?

Cheers, Galder

leedm777
  • 23,444
  • 10
  • 58
  • 87
Galder Zamarreño
  • 5,027
  • 2
  • 26
  • 34
  • Why you could not add a `case C` directly in you code ? – ptitpoulpe Feb 08 '12 at 13:45
  • Imagine we're talking of different versions of a protocol. Version 1 supports operations A and B, and Version 2 adds operation C. Assuming the original class is for Version 1, I don't want to pollute it with Version 2 stuff. I'd like to keep version 2 stuff in a subclass or different class altogether. – Galder Zamarreño Feb 08 '12 at 13:48
  • You could have an interface for the protocol, and have protocol 1 and 2 as subclass of this protocol. protocol 2 could even be a subclass of protocol 1. – ptitpoulpe Feb 08 '12 at 13:58
  • 5
    As you are already throwing around O-notations I cannot refrain from pointing out that O(1) = O(2). It's certainly a good question design-wise, but I'd leave out the performance discussion. – Frank Feb 08 '12 at 14:03
  • Sure, that's what I have, but I can't see how your suggestion helps here. Imagine protocol 1 has the 1st pattern matching above. Any subclassing results in having 2 pattern match lookups, unless i duplicate the 1st pattern matching in the subclass, which I want to avoid. – Galder Zamarreño Feb 08 '12 at 14:05
  • @Frank, hmmm, surely it's better to do 1 case lookup than 2? Or am I missing something here? – Galder Zamarreño Feb 08 '12 at 14:06
  • 3
    *... is the root of all evil* (Knuth) - complete the sentence to find out what you're missing. Considering that executing operations is most likely what will take time it's hardly interesting to eliminate a single constant expression for the sake of performance. – Frank Feb 08 '12 at 14:28

3 Answers3

5

To answer the original question, pattern matching is effectively translated into a series of if/else statements, just a series of tests. The case _ at the end is just a fallthrough (without an associated test). So there is very little difference between having A, B and C in a single match and A and B in a match and then delegating to another match which matches C. Using the following as an example:

class T

case class A(v: Int) extends T
case class B(v: Int) extends T
case class C(v: Int) extends T

class Foo {
  def getOperation(t: T): Unit = {
    t match {
      case A(2) => println("A")
      case B(i) => println("B")
      case _ => unknownOperation(t)
    }
  }

  def unknownOperation(t: T): Unit = println("unknown operation t=" + t)
}

class Bar extends Foo {
  override def unknownOperation(t: T): Unit = t match {
    case C(i) => println("C")
    case _ => println("unknown operation t=" + t)
  }
}

Using jad to decompile Foo.class, we get:

public void getOperation(T t) {
label0:
    {
        T t1 = t;
        if(t1 instanceof A)
        {
            if(((A)t1).v() == 2)
            {
                Predef$.MODULE$.println("A");
                break label0;
            }
        } else
        if(t1 instanceof B)
        {
            Predef$.MODULE$.println("B");
            break label0;
        }
        unknownOperation(t);
    }
}

So, I would say that you shouldn't worry about performance[*].

However, from a design point of view I would probably switch things around a bit and either use the Command pattern as Frank suggests, or instead of overriding unknownOperation in the subclass, you could override getOperation, which does the match against C and then delegates to super.getOperation(), which seems neater to me.

class Bar extends Foo {
  override def getOperation(t: T): Unit = t match {
    case C(i) => println("C")
    case _ => super.getOperation(t)
  }
}

[*] The caveat to this would be complexity. There is an issue in pre-2.10 versions of Scala in which the pattern matcher generates .class files which are too complex (nested extractors generate exponential-space bytecode), so if you're using a match which is very complex, this can cause problems. This is fixed in 2.10 with the virtual pattern matcher. If you're having this problem, then one of the workarounds for this bug is to split your pattern matches into different methods/classes.

Matthew Farwell
  • 60,889
  • 18
  • 128
  • 171
  • Thanks Matthew for the answer. I like your second suggestion. Based on some performance tests, I found pattern matching to be faster than map lookups, so that's one of the reasons I was leaning towards this option. Can't make it next week unfortunately but hope we can meet again soon :) – Galder Zamarreño Feb 08 '12 at 18:10
  • Matthew, same question I posted to Dave, is there a significant difference in performance between case pattern and Map lookup? – Galder Zamarreño Feb 09 '12 at 09:32
2

re: a handleUnknownOperation method, that's not a terribly clean OO design. For example, it become's a bit odd when you want to handle a D operation if a v3 subclass. Instead, just create a handle method that subclasses can override:

class BaseProtocol {
  def handle(operation: Operation) = operation match {
    case A => doA()
    case B => doB()
    case _ => throw new Exception("Unknown op: " + operation)
  }
}

class DerivedProtocol extends BaseProtocol {
  override def handle(operation: Operation) = operation match {
    case C => doC()
    case _ => super.handle(operation)
  }
}

re: efficiency, you are probably prematurely optimizing, but I won't let that stop me :-)

Are all of your operations objects? If so, you can replace the match statements with a Map.

class BaseProtocol {
  def handlers = Map(A -> doA _, B -> doB _)

  def handle(operation: Operation) =
    handlers.get(operation)
      .getOrElse(throw new Exception("Unknown op: " + operation))
      .apply()
}

class DerivedProtocol extends BaseProtocol {
  override def handlers = super.handlers + (C -> doC _)
}
leedm777
  • 23,444
  • 10
  • 58
  • 87
  • Dave, thanks for your suggestions. I had a question, is there a significant performance difference between a pattern matching and map lookups? As said above, seems like pattern matching resolves to if/else statements, so I guess it depends on how many cases you have. Cheers – Galder Zamarreño Feb 09 '12 at 09:31
  • According to the [performance characteristics of the Scala Collections API](http://docs.scala-lang.org/overviews/collections/performance-characteristics.html), `HashMap` lookup is effectively constant time (theoretical worst case is linear time, but practically that wouldn't happen). – leedm777 Feb 09 '12 at 14:55
  • Interstingly, though, small maps (up to size 4 in Scala 2.9), `Map()` [builds specialized instances](https://github.com/scala/scala/blob/f0d28aa4851d7b4195c7d2c8aaf49d6b9eaafd86/src/library/scala/collection/immutable/Map.scala#L97) of `Map1` to `Map4`, which lookup by chained if-else statements. But if the map is small, then big-O complexity isn't going to matter. If you need to micro-manage performance at that level, then you'll have to break our your profiler and measure it. – leedm777 Feb 09 '12 at 15:02
0

In line with my comments above, I am considering this question from a design perspective rather than a performance point of view.

So the question slightly shifts to: How can you write a protocol implementation supporting several possible operations, and make it versionable such that operation lookup can be adjusted?

One approach towards a solution has been given in the comments in terms of subclassing. In order to keep the operation resolution mechanism unchanged, I'd even suggest to replace the pattern matching by another lookup mechanism.

Here's an idea:

  • Base-class/trait CanPerformOperations. This class keeps a hash map that maps operations (looks like you're using case objects for that, which would work nicely) to functions, which perform the actual operation. It also provides methods to register operations, i.e. modify the hash map, and execute the operation associated with a command by looking up the operation function in the hash map and executing it.
  • Class ProtocolVersion1 extends CanPerformOperations registers operations doA and doB for A and B commands respectively.
  • Class ProtocolVersion2 extends ProtocolVersion1 additionally registers operation doC for the C command.
  • Class ProtocolVersion2_1 extends ProtocolVersion2 may register a new operation for the A command, because the behavior in version 2.1 differs from that in 1.0.

Bascially, this separates the lookup and command execution mechanism into a trait/base-class and gives you full flexibility of defining the commands supported by concrete implementations.

Of course, all of this is kind of related to the Command pattern.

In principal, it's not an answer to your question per se, as the match is not extended. Practically, it achieves what you want to do while keeping the (amortized) constant complexity.

Frank
  • 10,461
  • 2
  • 31
  • 46