5

I wrote a function doing simple math:

def clamp(num: Double, min: Double, max: Double) =
  if (num < min) min else if (num > max) max else num

It is very simple, until I needed the same function with Long type. I generalized it with type parameter and specialization:

import Ordering.Implicits._
def clamp[@specialized N: Ordering](num: N, min: N, max: N) =
  if (num < min) min else if (num > max) max else num

It works, but I found that the bytecode does lots of boxing and unboxing under the hood:

public boolean clamp$mZc$sp(boolean num, boolean min, boolean max, Ordering<Object> evidence$1)
{
  return Ordering.Implicits..MODULE$.infixOrderingOps(BoxesRunTime.boxToBoolean(num), evidence$1).$greater(BoxesRunTime.boxToBoolean(max)) ? max : Ordering.Implicits..MODULE$.infixOrderingOps(BoxesRunTime.boxToBoolean(num), evidence$1).$less(BoxesRunTime.boxToBoolean(min)) ? min : num;
}

public byte clamp$mBc$sp(byte num, byte min, byte max, Ordering<Object> evidence$1)
{
  return Ordering.Implicits..MODULE$.infixOrderingOps(BoxesRunTime.boxToByte(num), evidence$1).$greater(BoxesRunTime.boxToByte(max)) ? max : Ordering.Implicits..MODULE$.infixOrderingOps(BoxesRunTime.boxToByte(num), evidence$1).$less(BoxesRunTime.boxToByte(min)) ? min : num;
}

public char clamp$mCc$sp(char num, char min, char max, Ordering<Object> evidence$1)
{
  return Ordering.Implicits..MODULE$.infixOrderingOps(BoxesRunTime.boxToCharacter(num), evidence$1).$greater(BoxesRunTime.boxToCharacter(max)) ? max : Ordering.Implicits..MODULE$.infixOrderingOps(BoxesRunTime.boxToCharacter(num), evidence$1).$less(BoxesRunTime.boxToCharacter(min)) ? min : num;
}

Is there any better way to do generalized arithmetic operations without boxing?

pocorall
  • 1,247
  • 1
  • 10
  • 24

3 Answers3

3

The spire project is definitely the right place to look for high performance numerical abstractions. All its typeclasses are specialized for common types such as long, double, float, int.

Here is your method using spire typeclasses:

import spire.algebra._
import spire.implicits._
def clamp[@specialized T:Order](a: T, min: T, max: T) =
  if(a < min) min else if(a > max) max else a

And here is the specialized bytecode (long version), extracted using :javap from the scala REPL:

public long clamp$mJc$sp(long, long, long, spire.algebra.Order<java.lang.Object>);
    descriptor: (JJJLspire/algebra/Order;)J
    flags: ACC_PUBLIC
    Code:
      stack=5, locals=8, args_size=5
         0: aload         7
         2: lload_1
         3: lload_3
         4: invokeinterface #96,  5           // InterfaceMethod spire/algebra/Order.lt$mcJ$sp:(JJ)Z
         9: ifeq          16
        12: lload_3
        13: goto          35
        16: aload         7
        18: lload_1
        19: lload         5
        21: invokeinterface #99,  5           // InterfaceMethod spire/algebra/Order.gt$mcJ$sp:(JJ)Z
        26: ifeq          34
        29: lload         5
        31: goto          35
        34: lload_1
        35: lreturn

As you can see, it is calling the long specialized version of the gt method of spire.algebra.Order, so there is no boxing involved.

You can also notice that the transformation from the operators (< and >) to the typeclass method invocation does not appear in the code. The machinery behind this is quite elaborate. See this blog post from Erik Osheim, one of the main authors of spire.

But the bottom line is that the result is very fast even though the code is generic.

Rüdiger Klaehn
  • 12,445
  • 3
  • 41
  • 57
2

This is not really a direct answer to the question, more of a comment, but it got to be longer than a comment, and I thought formatting would be useful.

The spire project was inspired by requiring to be able to abstract over mathematical operations to be able to write generalized mathematical code with minimal overhead.

The project certainly appears to perform pretty close to native functions in benchmarks, such as the one referenced in the previous article.

It achieves this using a combination of both specialization and also additional macros to rewrite code, which is described in this paper, which I think is from Scala Days 2012.

Given the results of the referenced benchmark, I'd imagine this project might meet your needs.

Mike Curry
  • 1,609
  • 1
  • 9
  • 12
0

As I know there is no way to do it because scala standard library uses @specialized very rare and particularly Ordering is not specialized.

And even if it was you still have overhead from calling Ordering.Implicits..MODULE$.infixOrderingOps. So type contexts are to high-level to help with such low-level optimizations.

So the only way to do generalized arithmetic operations without overhead I see is code generation in some way.