1

I am trying to perform a rotate left (ROTL) operation on an arbitrary amount of rotations on a 64-bit word currently represented as a byte array of 8 bytes in a JavaCard smart card.

The ugly way around is to hard-code all 64 possible permutations of ROTL on a 64-bit word represented as an 8 byte array but that will simply bloat the entire codebase up.

How do I make it leaner so that I can on-the-fly do any arbitrary amount of ROTL operations on demand on a 64-bit word (in byte array) with the use of byte and short types only (due to JavaCard not recognizing more complex things like int or long and so on) without need for hard-coding all ROTL64 permutations.

phuclv
  • 37,963
  • 15
  • 156
  • 475
thotheolh
  • 7,040
  • 7
  • 33
  • 49
  • I'm not very familiar with JavaCard but it seems to support shift-by-variable and bitwise operations, building a multi-byte variable-rotate out of that doesn't seem fundamentally problematic (really the same as it would be in general Java, but with more casts to avoid `int`), what am I missing here? – harold Sep 28 '18 at 09:48
  • You could do shift by 8 bit or 16 bit short type but beyond that the int, long and so on are not available. – thotheolh Sep 28 '18 at 10:36
  • OK good, there are no `int` or `long` in this problem anyway so their non-availability doesn't really affect it – harold Sep 28 '18 at 10:46
  • 1
    related? [Fast bit shift of a byte array - CMAC subkeys](https://stackoverflow.com/q/30350492/995714) – phuclv Sep 28 '18 at 14:57
  • Read the 2nd sentence I wrote. That topic attempts to hard code the permutations which is not what I am looking for. – thotheolh Sep 28 '18 at 15:02
  • @harold OK, I give in, *if* you have an input *and output* array then the problem should be easy to solve. It is quite a bit harder on Java card with just a single array. I tried to make a generic version for an array in a buffer and I had quite the time implementing it. – Maarten Bodewes Sep 29 '18 at 15:26

2 Answers2

1

The following method performs rotate right for any kind of array in a buffer, without requiring an additional output or temporary array:

/**
 * Rotates the indicated bytes in the given buffer to the right by a certain
 * number of bits.
 * 
 * @param buf
 *            the buffer in which the bits need to be rotated
 * @param off
 *            the offset in the buffer where the rotation needs to start
 * @param len
 *            the amount of bytes
 * @param rot
 *            the amount to rotate (any 31 bit value allowed)
 */
public static void rotr(byte[] buf, short off, short len, short rot) {
    if (len == 0) {
        // nothing to rotate (avoid division by 0)
        return;
    }

    final short lenBits = (short) (len * BYTE_SIZE);
    // doesn't always work for edge cases close to MIN_SHORT / MAX_SHORT
    rot = (short) ((rot + lenBits) % lenBits);

    // reused variables for byte and bit shift
    short shift, i;
    byte t1, t2;

    // --- byte shift

    shift = (short) (rot / BYTE_SIZE);

    // only shift when actually required
    if (shift != 0) {

        // values will never be used, src == start at the beginning
        short start = -1, src = -1, dest;

        // compiler is too stupid to see t1 will be assigned anyway
        t1 = 0;

        // go over all the bytes, but in stepwise fashion
        for (i = 0; i < len; i++) {
            // if we get back to the start
            // ... then we need to continue one position to the right
            if (src == start) {
                start++;
                t1 = buf[(short) (off + (++src))];
            }

            // calculate next location by stepping by the shift amount
            // ... modulus the length of course
            dest = (short) ((src + shift) % len);

            // save value, this will be the next one to be stored
            t2 = buf[(short) (off + dest)];
            // store value, doing the actual shift
            buf[(short) (off + dest)] = t1;

            // do the step
            src = dest;
            // we're going to store t1, not t2
            t1 = t2;
        }
    }

    // --- bit shift

    shift = (short) (rot % BYTE_SIZE);

    // only shift when actually required
    if (shift != 0) {

        // t1 holds previous byte, at other side
        t1 = buf[(short) (off + len - 1)];
        for (i = 0; i < len; i++) {
            t2 = buf[(short) (off + i)];
            // take bits from previous byte and this byte together
            buf[(short) (off + i)] = (byte) ((t1 << (BYTE_SIZE - shift)) | ((t2 & BYTE_MASK) >> shift));
            // current byte is now previous byte
            t1 = t2;
        }
    }
}

private static final short BYTE_MASK = 0xFF;
private static final short BYTE_SIZE = 8;

A drawback is that it requires two passes over all the data: one of the byte shift and one for the bit shift. It skips them when they are not required (if you know that the skipping is never performed then you can easily remove those checks).

Here is the left-rotate. The left rotate is somewhat harder to implement itself - requiring a few more calculations, so the cost of the additional method call may be offset against that. If you rotate by using a literal then you can just use the rotr function of course, or calculate the rotation amount yourself.

public static void rotl(byte[] buf, short off, short len, short bits) {
    final short lenBits = (short) (len * BYTE_SIZE);
    bits = (short) ((bits + lenBits) % lenBits);
    // we don't care if we pass 0 or lenBits, rotr will adjust
    rotr(buf, off, len, (short) (lenBits - bits));
}

Note: A previous version required rotates over 64 bit, was more time constant and didn't have the offset included. It also required a 64 bit specific array with loop constants (which is now replaced by the much more generic inner if statement in the for loop of the byte shift). See the edits for other versions.


Rotating gets much easier when an output buffer is available: this implementation can consist of just the initialization part and the last 4 lines of code. Crypto code often only shifts by a constant size and by odd numbers, so just the last 4 lines can then be used, skipping the optimizations in case no (bit) shift needs to be performed.

I've deliberately used a slightly different interface for that which assumes 64 bits for the rotate, just to show a slightly different implementation.

public static void rotr64(byte[] inBuf, short inOff, byte[] outBuf, short outOff, short rot) {
    short byteRot = (short) ((rot & 0b00111000) >> 3); 
    short bitRot = (short) (rot & 0b00000111); 

    if (bitRot == 0) {

        if (byteRot == 0) {
            // --- no rotation
            return;
        }

        // --- only byte rotation
        for (short i = 0; i < LONG_BYTES; i++) {
            outBuf[(short) (outOff + (i + byteRot) % LONG_BYTES)] = inBuf[(short) (inOff + i)];
        }
    } else {
        // --- bit- and possibly byte rotation
        // note: also works for all other situations, but slower

        // put the last byte in t_lo
        short t = (short) (inBuf[inOff + LONG_BYTES - 1] & BYTE_MASK);
        for (short i = 0; i < LONG_BYTES; i++) {
            // shift t_lo into t_hi and add the next byte into t_lo
            t = (short) (t << BYTE_SIZE | (inBuf[(short) (inOff + i)] & BYTE_MASK));
            // find the byte to receive the shifted value within the short 
            outBuf[(short) (outOff + (i + byteRot) % LONG_BYTES)] = (byte) (t >> bitRot); 
        }
    }
}

private static final int LONG_BYTES = 8;
private static final short BYTE_MASK = 0xFF;
private static final short BYTE_SIZE = 8;

and it can be further simplified if the offset is always set to zero.

Here is the left rotate, in case you need a generic function for that:

public static void rotl64(byte[] inBuf, short inOff, byte[] outBuf, short outOff, short rot) {
    rotr64(inBuf, inOff, outBuf, outOff, (short) (64 - rot & 0b00111111));
}

Everything is tested against random input (a million runs or so, which takes less than a second on Java SE) although I do not offer warranty on the testing; please test yourself.

Maarten Bodewes
  • 90,524
  • 13
  • 150
  • 263
1

A very simple implementation which receives the four shorts in separate parameters

public static void rotateRight64(short x3, short x2, short x1, short x0,
                                 short rotAmount, short[] out)
{
    assert out.length() == 4;
    rotAmount &= (1 << 6) - 1;  // limit the range to 0..63
    if (rotAmount >= 16)
        rotateRight64(x0, x3, x2, x1, rotAmount - 16, out);
    else
    {
        out[0] = (short)((x0 >>> rotAmount) | (x1 << (16-rotAmount)));
        out[1] = (short)((x1 >>> rotAmount) | (x2 << (16-rotAmount)));
        out[2] = (short)((x2 >>> rotAmount) | (x3 << (16-rotAmount)));
        out[3] = (short)((x3 >>> rotAmount) | (x0 << (16-rotAmount)));
    }
}

It's a rotate right but it's easy to do a rotate left by rotating right 64 - rotAmount

Alternatively it can be done like this without the coarse shifting step

public static void rotateRight(short[] out, short[] in, short rotAmount) // in ror rotAmount
{
    assert out.length() == 4 && in.length() == 4 && rotAmount >= 0 && rotAmount < 64;

    const short shift     = (short)(rotAmount % 16);
    const short limbshift = (short)(rotAmount / 16);

    short tmp = in[0];
    for (short i = 0; i < 4; ++i)
    {
        short index = (short)((i + limbshift) % 4);
        out[i]  = (short)((in[index] >>> shift) | (in[index + 1] << (16 - shift)));
    }
}

This way it can be easily changed to an arbitrary-precision shift/rotate

If the input array is byte then you can change short[4] to byte[8] and change all the constants from 16 → 8 and from 4 → 8. In fact they can be generalized without problem, I'm just hard coding to keep it simple and easier to understand

phuclv
  • 37,963
  • 15
  • 156
  • 475
  • 1
    Your first function doesn't have any output and uses an `int` (which can be easily changed to a `short` or `byte` of course). The second requires an array, which means that a special array needs to be made on Java Card for it, especially if the data is present in a byte array. There are integers as well, but again, those can be easily removed. The answer is probably faster and less complex than mine, but it isn't Java Card compatible *yet*. – Maarten Bodewes Sep 29 '18 at 12:36
  • @MaartenBodewes thanks, I've fixed the return problem. I don't know what `int` you're talking about since in that fragment I only have `int rotAmount` – phuclv Sep 29 '18 at 13:49
  • 1
    Java Card generally doesn't implement `int` at all. Officially it is an "optional feature" but since the API doesn't use any integers it will be hard to find any implementations that do use `int` - or any applet designers that assume that it will be present. Any `int` is therefore one `int` too many. – Maarten Bodewes Sep 29 '18 at 13:56
  • thanks, I've never had a chance to know how it works. What a strange Java implementation – phuclv Sep 29 '18 at 14:04
  • 2
    It's fun, if you like that kind of fun. For instance, arrays are created as persistent arrays (!) and there is no garbage collection (!!) to speak of. So `new short[4]` - or any kind of array of shorts - already makes your implementation unpractical. It's fun, try it sometimes. It will throw you back to the old times of bit hacking, with the addition of doing this with signed shorts and bytes, all kinds of security requirements, conformance & security evals and pages of crypto that need to implemented in < 8 KiB of RAM :P – Maarten Bodewes Sep 29 '18 at 15:11