2

I have old code with a lot of methods like long[] toLongArray(int[] array) but for many different primitive types configurations (on both sides) and I just wonder if it is possible to make one generic method for this - without losing performance.
First I created simple method using MethodHandles for int[] -> long[] pair:

static final MethodHandle getIntElement  = MethodHandles.arrayElementGetter(int[].class);
static final MethodHandle setLongElement = MethodHandles.arrayElementSetter(long[].class);
static long[] specializedMethodHandle(int[] array) throws Throwable {
    long[] newArray = new long[array.length];
    for (int i = 0; i < array.length; i++) getIntElement.invokeExact(newArray, i, (long) (int) setLongElement.invokeExact(array, i));
    return newArray;
}

And it works great - same performance as manual loop, so I decided to make this generic:

static Map<Class<?>, MethodHandle> metHanGettersObj = Map.of(int[].class, MethodHandles.arrayElementGetter(int[].class).asType(MethodType.methodType(Object.class, Object.class, int.class)));
static Map<Class<?>, MethodHandle> metHanSettersObj = Map.of(long[].class, MethodHandles.arrayElementSetter(long[].class).asType(MethodType.methodType(void.class, Object.class, int.class, Object.class)));
static <F, T> T genericMethodHandleObject(Class<T> to, F array) throws Throwable {
    int length = Array.getLength(array);
    Object newArray = Array.newInstance(to.getComponentType(), length);
    MethodHandle getElement = metHanGettersObj.get(array.getClass());
    MethodHandle setElement = metHanSettersObj.get(to);
    for (int i = 0; i < length; i++) setElement.invokeExact(newArray, i, getElement.invokeExact(array, i));
    return (T) newArray;
}

But this works much much slower, for my example array of 500000 elements it was over 15x slower.
What is interesting CompiledScript made with Nashorn javascript engine is around 20% faster than this code. (simple copy loop inside)

So I wonder if someone know other way to do this? I will probably not use that anywhere as it is starting to be too "hacky" but now I just need to know if it is possible at all - as no generic method with method handles works fine, so why this one is that slow, and is it possible to make it faster?

GotoFinal
  • 3,585
  • 2
  • 18
  • 33
  • 1
    How are you bench-marking this? – Jorn Vernee Jun 24 '18 at 11:57
  • 2
    "possible to make one generic method for this" no, because there is no useful common supertype. Anything you do will be reflective, and that will be less performant than individual methods. – Andy Turner Jun 24 '18 at 12:05
  • @JornVernee yes, otherwise I would not know if it is slower or not. (simple benchmark in JMH) – GotoFinal Jun 24 '18 at 12:05
  • @AndyTurner MethodHandles can be as fast as normal code - but only if used right in not in all cases, in some places lambda factory + method handles will help. But here so far I can't find better way. But if you will do this using reflections and `Array` class it will be additional 15x slower than this MethodHandle version - so this is already better than reflections. As method handles are not just reflections. – GotoFinal Jun 24 '18 at 12:08
  • No seriously, how are you benchmarking this? – Makoto Jun 24 '18 at 15:29
  • @Makoto you can check answer - my benchmarks are near this same. – GotoFinal Jun 24 '18 at 19:15

1 Answers1

3

You can bootstrap together an array converter method handle, that you then cache in some static map.

Here's a benchmark including the code. The convertBootstrap method creates the converter, that's where the real magic happens:

@BenchmarkMode({ Mode.AverageTime })
@Warmup(iterations = 10, batchSize = 1)
@Measurement(iterations = 10, batchSize = 1)
@Fork(1)
@State(Scope.Thread)
public class MyBenchmark {

    int[] input;

    static final Map<Class<?>, Map<Class<?>, Function<?, ?>>> cacheGeneric = new HashMap<>();

    @Setup
    public void setup() {
        input = new Random(1).ints().limit(500_000).toArray();
    }

    @Benchmark
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    public long[] manual() {
        long[] result = new long[input.length];
        for(int i = 0 ; i < input.length; i++) {
            result[i] = input[i];
        }
        return result;
    }

    @Benchmark
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    public long[] cachedGeneric() {
        return getWrapped(int[].class, long[].class).apply(input);
    }

    @Benchmark
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    public long[] reflective() throws Throwable {
        return genericMethodHandleObject(long[].class, input);
    }

    static Map<Class<?>, MethodHandle> metHanGettersObj = Map.of(int[].class, MethodHandles.arrayElementGetter(int[].class).asType(MethodType.methodType(Object.class, Object.class, int.class)));
    static Map<Class<?>, MethodHandle> metHanSettersObj = Map.of(long[].class, MethodHandles.arrayElementSetter(long[].class).asType(MethodType.methodType(void.class, Object.class, int.class, Object.class)));
    static <F, T> T genericMethodHandleObject(Class<T> to, F array) throws Throwable {
        int length = Array.getLength(array);
        Object newArray = Array.newInstance(to.getComponentType(), length);
        MethodHandle getElement = metHanGettersObj.get(array.getClass());
        MethodHandle setElement = metHanSettersObj.get(to);
        for (int i = 0; i < length; i++) setElement.invokeExact(newArray, i, getElement.invokeExact(array, i));
        return (T) newArray;
    }

    @SuppressWarnings("unchecked")
    public static <F, T> Function<F, T> getWrapped(Class<F> from, Class<T> to) {
        return (Function<F, T>) cacheGeneric.computeIfAbsent(from, k -> new HashMap<>())
            .computeIfAbsent(
                to, k -> {
                    MethodHandle mh = convertBootstrap(from, to);
                    return arr -> {
                        try {
                            return (T) mh.invoke(arr);
                        } catch (Throwable e) {
                            throw new RuntimeException(e);
                        }
                    };
                });
    }

    public static MethodHandle convertBootstrap(Class<?> from, Class<?> to) {       
        MethodHandle getter = arrayElementGetter(from);
        MethodHandle setter = arrayElementSetter(to);

        MethodHandle body = explicitCastArguments(setter, methodType(void.class, to, int.class, from.getComponentType()));      
        body = collectArguments(body, 2, getter); // get from 1 array, set in other
        body = permuteArguments(body, methodType(void.class, to, int.class, from), 0, 1, 2, 1);
        body = collectArguments(identity(to), 1, body); // create pass-through for first argument
        body = permuteArguments(body, methodType(to, to, int.class, from), 0, 0, 1, 2);

        MethodHandle lenGetter = arrayLength(from);
        MethodHandle cons = MethodHandles.arrayConstructor(to);
        MethodHandle init = collectArguments(cons, 0, lenGetter);

        MethodHandle loop = countedLoop(lenGetter, init, body);
        return loop;
    }
}

Benchmark results are about the same for my method and manual (less score is better):

# JMH version: 1.19
# VM version: JDK 10.0.1, VM 10.0.1+10

Benchmark                  Mode  Cnt   Score   Error  Units
MyBenchmark.cachedGeneric  avgt   10   1.175 ± 0.046  ms/op
MyBenchmark.manual         avgt   10   1.149 ± 0.098  ms/op
MyBenchmark.reflective     avgt   10  10.165 ± 0.665  ms/op

I was actually really surprised how well this is being optimized (unless I made a mistake in the benchmark somewhere, but I can't find it). If you increase the number of elements to 5 million you can see the difference again:

Benchmark                  Mode  Cnt    Score    Error  Units
MyBenchmark.cachedGeneric  avgt   10  277.764 ± 14.217  ms/op
MyBenchmark.manual         avgt   10   14.851 ±  0.317  ms/op
MyBenchmark.reflective     avgt   10   76.599 ±  3.695  ms/op

Those numbers suggest to me that some loop un-rolling/inlining/something-else limit is being hit though, since the difference is suddenly a lot bigger.

You will probably also see a performance drop when the array types are not statically known.

Jorn Vernee
  • 31,735
  • 4
  • 76
  • 93
  • You used `from` in computeIfAbsent twice otherwise it seems to work - what is interesting is warmup - at the beginning it is very slow, but then it kicks in and is like 80x faster. I need to check that in something more "dynamic" to check if it will break with more different types used. – GotoFinal Jun 24 '18 at 16:45
  • @GotoFinal Oops, yeah that's a typo. I also bench marked against your solution and saw it about 8x slower than the 2 methods I've shown here. I guess it also depends on machine/JDK how well this works. And yeah, the fast path for method handles is really fast, they gave it a lot of attention. The biggest downside imho is putting them together (using sticks to build a forest) is pretty hard. – Jorn Vernee Jun 24 '18 at 16:47
  • Weird, can you link whole benchmark? maybe it is too short? – GotoFinal Jun 24 '18 at 16:51
  • What is even more weird - for some benchmarks your method seems to be faster than normal loop (but with that big warmup time) – GotoFinal Jun 24 '18 at 16:55
  • @GotoFinal I've added the benchmark of your solution. – Jorn Vernee Jun 24 '18 at 16:56
  • ok, that faster thing was my fault - I add volatile but didn't notice that you are accessing that field directly in loop - so it added a lot of overhead – GotoFinal Jun 24 '18 at 16:58
  • I can't reproduce that slower `cachedGeneric` case - for me it is always faster than `reflective`, interesting... Anyway, I think I will accept your answer, but that warmup time is so big that it would be pain to use in most cases :D But this is still amazing code and solves this question - didn't even know it is possible to do this like that. – GotoFinal Jun 24 '18 at 17:34
  • ok, reproduced that with even larger array - 100_000_000 elements. – GotoFinal Jun 24 '18 at 17:45
  • @GotoFinal The warmup is larger because the JIT compiler hasn't kicked in yet at that point. The JVM mostly optimizes for peak performance, not startup. You should expect to see larger warmup times that then drop off. The simplicity of the manual loop just means that there's not much to optimize, so you don't see higher warmup. – Jorn Vernee Jun 24 '18 at 17:52
  • I did another test - I used @Fork(0) to first warmup this method with small array, and then use larger one... and it runs fine then - as fast as manual way. So for some reason java can't optimize this handle if loop is always long - but works great if you will first warmup this with small loops. Magic of JIT... :D And about long warmup - just for this code it is longer than for most of other codes - but now it seems to be better if you first use small array to warm it up... weird. – GotoFinal Jun 24 '18 at 18:10