Java: IEEE Doubles to IBM Float

Question

I am working on a side project at work where I would like to read/write SAS Transport files. The challenge is that numbers are encoded in 64-bit IBM floating point numbers. While I have been able to find plenty of great resources for reading a byte array (containing an IBM float) into a IEEE 32-bit floats and 64-bit floats, I'm struggling to find the code to convert floats/doubles back to IBM floats.

I recently found some code for writing a 32-bit IEEE float back out to a byte array (containing an IBM float). It seems to be working, so I've been trying to translate it to a 64-bit version. I've reversed engineered where most of the magic numbers are coming from, but I've been stumped for over a week now.

I have also tried to translate the functions listed at the end of the SAS Transport documentation to Java, but I've run into a lot of issues related to endiness, Java's lack of unsigned types, and so on. Can anyone provide the code to convert doubles to IBM floating point format?

Just to show the progress I've made, here are some shortened versions of the code I've written so far:

This grabs a 32-bit IBM float from a byte array and generates an IEEE float:

public static double fromIBMFloat(byte[] data, int offset) {
    int temp = readIntFromBuffer(data, offset);
    int mantissa = temp & 0x00FFFFFF;
    int exponent = ((temp >> 24) & 0x7F) - 64;
    boolean isNegative = (temp & 0x80000000) != 0;
    double result = mantissa * Math.pow(2, 4 * exponent - 24);
    if (isNegative) {
        result = -result;
    }
    return result;
}

This is the same thing for 64-bit:

public static double fromIBMDouble(byte[] data, int offset) {
    long temp = readLongFromBuffer(data, offset);
    long mantissa = temp & 0x00FFFFFFFFFFFFFFL;
    long exponent = ((temp >> 56) & 0x7F) - 64;
    boolean isNegative = (temp & 0x8000000000000000L) != 0;
    double result = mantissa * Math.pow(2, 4 * exponent - 24);
    if (isNegative) {
        result = -result;
    }
    return result;
}

Great! These work for going to IEEE floats, but now I need to go the other way. This simple implementation seems to be working for 32-bit floats:

public static void toIBMFloat(double value, byte[] xport, int offset) {
    if (value == 0.0 || Double.isNaN(value) || Double.isInfinite(value)) {
        writeIntToBuffer(xport, offset, 0);
        return;
    }
    int fconv = Float.floatToIntBits((float)value);
    int fmant = (fconv & 0x007FFFFF) | 0x00800000;
    int temp = (fconv & 0x7F800000) >> 23;
    int t = (temp & 0xFF) - 126;
    while ((t & 0x3) != 0) {
        ++t;
        fmant >>= 1;
    }
    fconv = (fconv & 0x80000000) | (((t >> 2) + 64) << 24) | fmant;
    writeIntToBuffer(xport, offset, fconv);
}

Now, the only thing left is to translate that to work with 64-bit IBM floats. A lot of the magic numbers listed relate to the number of bits in the IEEE 32-bit floating point exponent (8-bits) and mantissa (23-bit). So for 64-bit, I just need to switch those to use the 11-bit exponent and 52-bit mantissa. But where does that 126 come from? What is the point of the 0x3 in the while loop?

Any help breaking down the 32-bit version so I can implement a 64-bit version would be greatly appreciated.

For the record, Java has all of the functionality needed to allow you to treat signed `int` and `long` values as unsigned; see the `Integer` and `Long` classes. — Stephen C, Mar 13 '19 at 23:03
I suggest you post the required format *here,* and your attempt. — user207421, Mar 13 '19 at 23:18
If I didn''t misunderstand your reference IBM provides ready to use LGPL code for that purpose including explanation: http://support.sas.com/techsup/technote/ts140.pdf — kai, Mar 13 '19 at 23:18
@kai Yeah, I've poured over the code in that reference for over a week now. I'm obviously losing something during translation to Java. I familiarized myself with the base 2 and base 16 formats as described on Wikipedia. I've considered endiness and ensuring right shifts aren't dragging leading 1's and still I see differences in my mantissas/exponents that ultimately lead to the wrong answer. — Travis Parks, Mar 13 '19 at 23:48
So if you have that, then why not post the concrete problem you are facing when implementing. — kai, Mar 14 '19 at 01:09
Look into the R package https://haven.tidyverse.org/ which really is just an R wrapper to the package https://github.com/WizardMac/ReadStat which I think is written in C. — Tom, Mar 14 '19 at 02:23

score 0 · Answer 1 · answered Mar 17 '19 at 20:01

I circled back and took another swing at the C implementations provided at the end of the SAS transport documentation. It turns out the issue wasn't with my implementation; it was an issue with my tests.

TL;DR These are my 64-bit implementations:

public static void writeIBMDouble(double value, byte[] data, int offset) {
    long ieee8 = Double.doubleToLongBits(value);
    long ieee1 = (ieee8 >>> 32) & 0xFFFFFFFFL;
    long ieee2 = ieee8 & 0xFFFFFFFFL;
    writeLong(0L, data, offset);
    long xport1 = ieee1 & 0x000FFFFFL;
    long xport2 = ieee2;
    int ieee_exp = 0;
    if (xport2 != 0 || ieee1 != 0) {
        ieee_exp = (int)(((ieee1 >>> 16) & 0x7FF0) >>> 4) - 1023;
        int shift = ieee_exp & 0x3;
        xport1 |= 0x00100000L;
        if (shift != 0) {
            xport1 <<= shift;
            xport1 |= ((byte)(((ieee2 >>> 24) & 0xE0) >>> (5 + (3 - shift))));
            xport2 <<= shift;
        }
        xport1 |= (((ieee_exp >>> 2) + 65) | ((ieee1 >>> 24) & 0x80)) << 24;
    }
    if (-260 <= ieee_exp && ieee_exp <= 248) {
        long temp = ((xport1 & 0xFFFFFFFFL) << 32) | (xport2 & 0xFFFFFFFFL);
        writeLong(temp, data, offset);
        return;
    }
    writeLong(0xFFFFFFFFFFFFFFFFL, data, offset);
    if (ieee_exp > 248) {
        data[offset] = 0x7F;
    }
}

public static void writeLong(long value, byte[] buffer, int offset) {
    buffer[offset] = (byte)(value >>> 56);
    buffer[offset + 1] = (byte)(value >>> 48);
    buffer[offset + 2] = (byte)(value >>> 40);
    buffer[offset + 3] = (byte)(value >>> 32);
    buffer[offset + 4] = (byte)(value >>> 24);
    buffer[offset + 5] = (byte)(value >>> 16);
    buffer[offset + 6] = (byte)(value >>> 8);
    buffer[offset + 7] = (byte)value;
}

And:

public static double readIBMDouble(byte[] data, int offset) {
    long temp = readLong(data, offset);
    long ieee = 0L;
    long xport1 = temp >>> 32;
    long xport2 = temp & 0x00000000FFFFFFFFL;
    long ieee1 = xport1 & 0x00ffffff;
    long ieee2 = xport2;
    if (ieee2 == 0L && xport1 == 0L) {
        return Double.longBitsToDouble(ieee);
    }
    int shift = 0;
    int nib = (int)xport1;
    if ((nib & 0x00800000) != 0) {
        shift = 3;
    } else if ((nib & 0x00400000) != 0) {
        shift = 2;
    } else if ((nib & 0x00200000) != 0) {
        shift = 1;
    }
    if (shift != 0) {
        ieee1 >>>= shift;
        ieee2 = (xport2 >>> shift) | ((xport1 & 0x00000007) << (29 + (3 - shift)));
    }
    ieee1 &= 0xffefffff;
    ieee1 |= (((((long)(data[offset] & 0x7f) - 65) << 2) + shift + 1023) << 20) | (xport1 & 0x80000000);
    ieee = ieee1 << 32 | ieee2;
    return Double.longBitsToDouble(ieee);
}

public static long readLong(byte[] buffer, int offset) {
    long result = unsignedByteToLong(buffer[offset]) << 56;
    result |= unsignedByteToLong(buffer[offset + 1]) << 48;
    result |= unsignedByteToLong(buffer[offset + 2]) << 40;
    result |= unsignedByteToLong(buffer[offset + 3]) << 32;
    result |= unsignedByteToLong(buffer[offset + 4]) << 24;
    result |= unsignedByteToLong(buffer[offset + 5]) << 16;
    result |= unsignedByteToLong(buffer[offset + 6]) << 8;
    result |= unsignedByteToLong(buffer[offset + 7]);
    return result;
}

private static long unsignedByteToLong(byte value) {
    return (long)value & 0xFF;
}

These are basically a one-to-one translation from what's in the document, except I convert the byte[] into a long up-front and just do bit-twiddling instead of working directly with bytes.

I also realized the code in the documentation had some special cases included for "missing" values that are specific to the SAS transport standard and have nothing to do with IBM hexidecimal floating point numbers. In fact, the Double.longBitsToDouble method detects the invalid bit-sequence and just sets the value to NaN. I moved this code out since it wasn't going to work anyway.

The good thing is that as part of this exercise I did learn a lot of tricks to bit manipulation in Java. For instance, a lot of the issues I ran into involving sign were resolved by using the >>> operator instead of the >> operator. Other than that, you just need to be careful upcasting to mask with 0xFF, 0xFFFF, etc. to make sure the sign is ignored.

I also learned about ByteBuffer which can facilitate loading back and forth among byte[] and primitives/strings; however, that comes with some minor overhead. But it would handle any endianness issues. It turns out endianness wasn't even a concern since most architectures in use today (x86) are little endian to begin with.

It seems reading/writing SAS transport files is a pretty common need, especially in the clinical trials arena so hopefully anyone working in Java/C# won't have to go through the trouble I did.

If you use unsigned types, they zero-extend automatically when widening. And right shifts are logical (not arithmetic). You can do bitfield insert/extract on signed types with shifts/AND/OR, but it makes more sense to use unsigned. — Peter Cordes, Mar 17 '19 at 20:23

Java: IEEE Doubles to IBM Float

1 Answers1