14

I have the need to perform algorithms on various primitive types; the algorithm is essentially the same with the exception of which type the variables are. So for instance,

/**
* Determine if <code>value</code> is the bitwise OR of elements of <code>validValues</code> array. 
* For instance, our valid choices are 0001, 0010, and 1000.
* We are given a value of 1001.  This is valid because it can be made from
* ORing together 0001 and 1000.
* On the other hand, if we are given a value of 1111, this is invalid because
* you cannot turn on the second bit from left by ORing together those 3
* valid values.
*/
public static boolean isValid(long value, long[] validValues) {
    for (long validOption : validValues) {
        value &= ~validOption;
    }
    return value == 0;
}

public static boolean isValid(int value, int[] validValues) {
    for (int validOption : validValues) {
        value &= ~validOption;
    }
    return value == 0;
}

How can I avoid this repetition? I know there's no way to genericize primitive arrays, so my hands seem tied. I have instances of primitive arrays and not boxed arrays of say Number objects, so I do not want to go that route either.

I know there are a lot of questions about primitives with respect to arrays, autoboxing, etc., but I haven't seen it formulated in quite this way, and I haven't seen a decisive answer on how to interact with these arrays.

I suppose I could do something like:

public static<E extends Number> boolean isValid(E value, List<E> numbers) {
    long theValue = value.longValue();
    for (Number validOption : numbers) {
        theValue &= ~validOption.longValue();
    }
    return theValue == 0;
}

and then

public static boolean isValid(long value, long[] validValues) {
    return isValid(value, Arrays.asList(ArrayUtils.toObject(validValues)));
}

public static boolean isValid(int value, int[] validValues) {
    return isValid(value, Arrays.asList(ArrayUtils.toObject(validValues)));
}

Is that really much better though? That way will create a lot more objects than the original implementation, though it DRYs up the source code. Any thoughts in this matter would be appreciated.

I82Much
  • 26,901
  • 13
  • 88
  • 119
  • Great question. Although I can't see a way to do it with generics, on a completely unrelated note, given the way iteration is managed in the 1.5 syntax, I'm guessing that you're doing an autoboxing for each number.I'd guess that it'd be faster to just do old-style array counting. – Steve B. Mar 19 '10 at 20:05
  • 1
    @I82Much: see my answer and the comment by Tom Hawtin here: http://stackoverflow.com/questions/2337170/managing-highly-repetitive-code-and-documentation-in-java Basically there's no "easy" way. You either have to deal with an insane amount of repetition or use a custom pre-processor or source-code instrumentation or some templating system. The very fact that even Sun is using such a technique proves that it has its use. Basically that is also how the amazing Trove primitives collections is doing it. See polygenelubricants' question and my +5 upvoted/accepted answer there. – SyntaxT3rr0r Mar 20 '10 at 03:26

4 Answers4

7

I asked a similar question before (Managing highly repetitive code and documentation in Java), and noted that the source code for java.util.Arrays is highly repetitive in its algorithms to deal with primitive array types.

In fact, the source code contains this comment:

The code for each of the seven primitive types is largely identical. C'est la vie.

The answer that I accepted suggests the use of a code generator that lets you work with code templates instead. There's also a comment that Sun/Oracle uses a templating system internally as well.

You can also use reflection to reduce repetition, but this is likely to be slow, and perhaps not worth the effort. If you want to test out its performance, this snippet demonstrates the technique:

import java.lang.reflect.Array;

static long maskFor(Class<?> c) {
    return (
        c.equals(int.class) ? 1L << Integer.SIZE :
        c.equals(short.class) ? 1L << Short.SIZE :
        c.equals(byte.class) ? 1L << Byte.SIZE :
        0
    ) - 1;
}   
public static void reflectPrimitiveNumericArray(Object arr) throws Exception {
    int length = Array.getLength(arr);
    Class<?> componentType = arr.getClass().getComponentType();
    long mask = maskFor(componentType);
    System.out.format("%s[%d] = { ", componentType, length);
    for (int i = 0; i < length; i++) {
        long el = Array.getLong(arr, i) & mask;
        System.out.print(Long.toBinaryString(el) + " ");
    }
    System.out.println("}");
}

You can pass an int[] for arr, as well as other primitive array types. Everything is cast into long, with bit-masking to address sign extension.

reflectPrimitiveNumericArray(new byte[] { (byte) 0xF0 });
// byte[1] = { 11110000 }
reflectPrimitiveNumericArray(new int[] { 0xF0F0F0F0 });
// int[1] = { 11110000111100001111000011110000 }
reflectPrimitiveNumericArray(new long[] { 0xF0F0F0F0F0F0F0F0L });
// long[1] = { 1111000011110000111100001111000011110000111100001111000011110000 }
Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • -1 - My take is that unless the duplicated code amounts to thousands of lines, introducing code generation into the mix is likely to make the code HARDER to maintain ... especially for people not already familiar with the code generation technology. – Stephen C Mar 20 '10 at 02:54
  • 1
    I'm just sharing what I learned from my question. It's up to OP what to do with this knowledge. – polygenelubricants Mar 20 '10 at 03:00
  • 1
    @Stephen C: +1... There's really no need to downvote him (btw I wrote the accepted answer). You may not like source code generation, you may not like that other people are doing it... But that *IS* how people writing high-performances collections working with primitives do it. Go take a look a Trove. Their TLongLongHashMap/TIntIntHashMap etc. completely destroy the default Java collections memory and performances wise. And how do they do it? Source code generation/templates/pre-processors etc. There are various case where these techniques are useful and where they provide a real benefit. – SyntaxT3rr0r Mar 20 '10 at 03:22
  • @WizardOfOdds - actually, I DO like source generation. (I won't bore you with details of my background, but trust me I *know* this stuff.) But, source generation is best used for model-to-source transformations, NOT as a glorified macro pre-processor. This is an example of the latter. – Stephen C Mar 20 '10 at 03:41
  • @Stephen C: consider also that this is just an example of one algorithm that he has. He may very well have plenty, and all together that may amount to thousands of lines. – polygenelubricants Mar 20 '10 at 03:42
  • @WizardOfOdds - I down voted because I think this is seriously bad advice. – Stephen C Mar 20 '10 at 03:45
  • @polygenelubricants - that is possible, but IMO unlikely. – Stephen C Mar 20 '10 at 03:46
  • Fortunately this is one of the few algorithms I need different versions for the different primitive types. It's just one more reason I'm irritated with Java... we're using code generation for another portion of the project but I'd prefer not to have to deal with it here. – I82Much Mar 21 '10 at 19:51
5

If you look in java.util.Arrays, you'll see that even they had to specialize all their algorithms (binarySearch, equals, etc.) for each primitive type.

I would not recommend relying on autoboxing if performance is an issue, but if not (after you've profiled), it would be a valid option.

Michael Myers
  • 188,989
  • 46
  • 291
  • 292
2

I think you're out of luck I'm afraid - without moving away from primitive types.

Tom
  • 43,583
  • 4
  • 41
  • 61
1

In a previous life, we had some primitive typed collections that were optimized for financial data (millions of orders in memory, using chunking arrays and such). Our solution was much like Trove's, with some stub files. The 'original' source file would be say... HashSet_key for example. With some stub classes for key and value, we followed Trove's model of using ant tasks to generate a HashSetInt, HashSetLong, etc...

I always thought this was a 'janky' method, however it worked. I am curious is anyone has ever tried Velocity, or possibly FMPP, and had some slightly cleaner results? The main problem I had with the ant solution is that all the code lives pretty close together, whereas in a lot of source code generation you may be able to separate template files better.

mgmiller
  • 463
  • 4
  • 5