Porting MATLAB's Reed Solomon function to Java

Question

I have implemented a simple RS error correction scheme in MATLAB with RS(160,80). The basic process is as follows:

I generate a message with length 80 and 8 bits per symbol, and I generate an RS code of length 160.
After generating the RS code, I add/XOR another Galois field of code length 160. (this field contains only 00000000 and 00000001). This is to simulate adding errors in the scheme. This generates my noisy code.
Now, I take another GF field (of a similar type as above [00000000 00000001]) which has < 40 symbols different from the one I used to create the noise. I add/XOR this to the generated noisy code in the previous step.
Finally, I run it through the RS decoder which retrieves my original message.

My MATLAB function:

function RSKeyExchange(dev1, dev2)
    dev1_fp = zeros(1,160);
    dev2_fp = zeros(1,160);

    for i=1:160
        dev1_fp(i) = str2double(dev1.key(i));
        dev2_fp(i) = str2double(dev2.key(i));
    end

    n = 160;        % total code length
    k = 80;         % message length - actual message for syncronisation
    m = 8;          % order (2^m)

    % therefore, RS decoder can detect t = n-k = 80 errors
    % and correct t/2 = 40 errors

    msg = gf(randi([1 255],1 ,80), m);
    code = rsenc(msg, n, k);

    noise_add = gf(dev1_fp, 8);
    noise_remove = gf(dev2_fp, 8);

    noisy_code = code + noise_add;

    % noisy_code is now transmitted from dev1 to dev2 (sender to receiver)

    decommited_code = noisy_code + noise_remove;
    [rxcode, cnumerr] = rsdec(decommited_code, n, k);

    fprintf('Number of errors corrected: %d\n', cnumerr);
end

Now I have been searching for ways to port this to Java. I have looked up libraries, but I'm not sure how to exactly port my particular use case.

Zxing - Takes only QR and Aztec codes as input
Backblaze - JavaReedSolomon - Fixes code erasures, which isn't the kind of error I'm producing, and input is in the form of files(seriously confused as to what is happening here)
Simple RS error correction example - Feels a bit more legible, but takes only strings as inputs. I feel I can modify it to suit my use case, but I'm not sure how to go about adding noise. I'm not sure how to go about generating an RS(160, 80) code via this implementation, nor can I tell how to generate custom GF fields to add noise.

Any help would appreciated (especially if you could point me to an implementation that would suit my use-case, or help modify one of the resources above which would work)

Thanks!

rcgldr · Accepted Answer · 2017-07-15T07:17:13.133

I looked at the "simple RS" example "GF28" Java code. The decoder appears to handle erasures only (one of the inputs is an array of bad indices). It's using GF(256) based on hex 11B = x^8 + x^4 + x^3 + x + 1, the same as AES encryption. It is somewhat unusual choice since the lowest "primitive" is 3 (all numbers other than zero can be considered to be a power of 3), rather than the fields where the "primitive" is 2. The field polynomial is defined via PX, so it can be easily changed. I'm not sure why it generates tables dynamically instead of generating them during initialization, using a second set of true/false tables to indicate if specific table values have been generated.

I have a C RS demo program for 8 bit GF(256) fields. It's interactive, you select a field (there are 30 of them), whether to use a self reciprocal polynomial (it's usually not used), if the first consecutive root of the generator polynomial is 1 (if no is specified, then the first consecutive root is the "primitive"), number of parity bytes, and number of data bytes. It handles both erasures and errors, and since it's a demo program, it includes code for the 3 main types of decoder algorithms Peterson matrix algorithm, Sugiyama's adaptation of extended Euclid algorithm, and Berlekamp Massey algorithm, and also Forney algorithm to generate error values. The interactive part could be replaced with code that selects all these parameters. For your situation, change the define for MAXPAR from 20 to 80 (maximum number of parities). The user input is via stdin, so a text input file can be used to run the demo.

http://rcgldr.net/misc/eccdemo8.zip

In my demo, to generate a code word, the user enters values (user option "E" or "C"), then enters "N" to encode. To generate errors, the user enters values at specific locations. To generate erasures, the user uses the "P" option to enter erasure values at specific locations. "F" is used to fix (correct) the code word.

The wiki article includes the logic for the 3 main types of decoders. It also explains the Fourier transform, but the Fourier transform requires using one of the other decoders in order to generate the error polynomial, and isn't practical.

https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Error_correction_algorithms

Based on your comment I looked at the Zxing library. The method descriptions are a bit terse and use the wrong terminology. Here's a redo of the description:

GenericGF(primitive, size, b)
    primitive - wrong term, this is the field polynomial

    size - The description is "size of the field", but that is
    determined by the polynomial, a n+1 bit polynomial is used
    for an n bit field. My guess is this is the size of the
    generator polynomial, perhaps the number of roots.

    b - alpha raised to the b power is the first consecutive root.
    alpha is the primitive for the particular field. Unlike the
    comment, b == 0 is about as common as b == 1.

https://en.wikipedia.org/wiki/Primitive_element_(finite_field)

encode(int[] toEncode, int ecBytes)
    toEncode - toEncode includes space for the "parity" data used
    to encode the message.

    ecByte - the number of "parity" elements at the end of to Encode.
    Why is it named ecBytes when it is an array of ints?

decode(int[] received, int twoS )
    received is an array of concatenated encoded messages?
    Since encode generates a single encoded message, then it would
    make more sense for decode to decode a single encoded message.

    twoS - the number of encoded messages?

    Based on the references, it is using the Sugiyama adaptation
    of extended Euclid algorithm. A side benefit is that the
    error evaluator polynomial is generated during the process.

The wrapper comment also has an error. The maximum size for a GF(256) codeword is 255 bytes, since error locations are limited to the range 0 to 254. This is because locations are related to powers, and any number raised to the power 255 in GF(256) is that same number.

Note that a mis-correction is possible without triggering the library's exception. However with 80 parity bytes, it would take 41 errors for this to be possible. The mis-correction would involve generating a set of 40 locations and values that produce a valid codeword, but one that differs from the original codeword by 81 or more bytes (81 or more errors). However with a shortened codeword of 160 bytes, all 40 of the generated locations would have to be in the range 0 to 159. Assuming random, uniform distribution of the calculated locations, the odds of them mis-correcting would be ((160!)/((160-40)!))/(255^40) != 3.86 × 10^-11 . Mis-correction is an issue if using full length (255) codewords, or smaller number of parities.

I made a java RS ecc GF(256) example. It doesn't handle erasures (which requires modifying syndromes for known erasure locations, then merging the error locators with the erasure locators). It uses Euclid algorithm to calculate error locator and error evaluator polynomials. It's commented, but you may need to look at Wiki RS article to better understand it. The code deals with Java signed bytes by converting them to unsigned integers using byte&0xff as needed. I set the parameters to 80 message bytes, 80 parity bytes to match the question's example. Although both add and subtract are xor for GF(256), the code uses separate functions for add and subtract so that the code corresponds to the algorithm as it would apply to GF(p) where p is a prime number, such as GF(257). The calculation for the "derivative" of Lambda would also be different for GF(p), this is explained here:

https://en.wikipedia.org/wiki/Forney_algorithm#Formal_derivative

Link to java rsecc GF(256) example:

http://rcgldr.net/misc/rsecc8.zip

The interactive C version is easier to follow since it's interactive, displays some of the internal calculations (defines can be changed to enable / disable what is displayed), and the user can try different variations.

Hey, thanks! Though I solved my problem using the Zxing library (found a wrapper that does a fair bit of the heavy lifting), your resource is very useful. — doberoi96, Jul 12 '17 at 15:24
@doberoi96 - I updated my answer with comments about the Zxing library. — rcgldr, Jul 13 '17 at 01:24
Wow, thanks for the description + the correction. The description is basically what I needed all along. About the mis-correction, I was being safe when I chose 40 errors as my threshold. This is actually a cryptographic scheme, and incorrect authentication attempts give more in the line of ~70 errors, while correct ones stick under 30. So for all practical purposes, it should do. But seriously, thanks, great catch. — doberoi96, Jul 13 '17 at 03:25
@doberoi96 - I added a link to a java RS ecc test program, that is similar to the one in the question. — rcgldr, Jul 14 '17 at 21:53
@doberoi96 - There was a fix needed for eccdemo8.c. In GenpErrorsB(), just after `j = vBx.size - vC.size; /* right shift vBx or vC */`, `j` needs to be cast from BYTE (unsigned) to char (signed) in order to compare for < 0 or > 0. This is now fixed. The link to the source code is the same as before. The Java example did not have this issue. — rcgldr, Aug 16 '17 at 07:58

score 0 · Answer 2 · answered Jul 12 '17 at 15:41

Okay, so I was able to solve my problem. I compromised a little so it isn't an exact port, and is pretty hacked up. Either way, it works for my use case.

I found a wrapper library for Zxing which does a fair bit of the heavy lifting.

Instead of using GF(8) like my MATLAB programme, the wrapper uses GenericGF.QR_CODE_FIELD_256, which uses GF(256) for encoding.

This is the java implementation which basically achieves the same thing. (MsgKeyPair is just a data structure holding two strings)

public MsgKeyPair generateKey(String fingerprint){
    EncoderDecoder encoderDecoder = new EncoderDecoder();

    try {
        String message = generateMessage();
        byte[] data = message.getBytes();

        byte[] encodedData = encoderDecoder.encodeData(data, 80);
        byte[] noisyData = encodedData.clone();

        for(int i=0; i<encodedData.length; i++){
            byte one = 0xff & 00000001;
            if(fingerprint.charAt(i) == '1') {
                int oneInt = (int) one;
                int edInt = (int)encodedData[i];
                int xor = oneInt ^ edInt;

                noisyData[i] = (byte)(0xff & xor);
            }
        }

        String keyToSend = new String(Base64.encode(noisyData, Base64.DEFAULT));
        return new MsgKeyPair(message, keyToSend);
    }
    catch (Exception e){
        Log.e(TAG, "generateKey: ", e);
    }

    return new MsgKeyPair("", "");
}

public String KeyDecoder(String fingerprint, String key) throws Exception{
    byte[] noisyData = Base64.decode(key, Base64.DEFAULT);

    byte[] decomCode = noisyData.clone();
    for(int i=0; i<noisyData.length; i++){
        byte one = 0xff & 00000001;
        if(fingerprint.charAt(i) == '1'){
            int oneInt = (int)one;
            int ndInt = (int) noisyData[i];
            int xor = oneInt ^ ndInt;

            decomCode[i] = (byte)(0xff & xor);
        }
    }

    EncoderDecoder encoderDecoder = new EncoderDecoder();

    byte[] decodedData = encoderDecoder.decodeData(decomCode, 80);
    String originalMessage = new String(decodedData);

    return originalMessage;
}

public static String generateMessage(){
    String msg = "";

    for(int i=0; i<80; i++){
        int bit = Math.round(randomWithRange(0,1));
        msg = msg + String.valueOf(bit);
    }

    return msg;
}

private static int randomWithRange(int min, int max)
{
    int range = (max - min) + 1;
    return (int)(Math.random() * range) + min;
}

I'm not quite following this. Why is the name Base64? Is the size of encodedData 160 bytes (80 data bytes, 80 "parity" (ecc) bytes)? Some of the functions are not included in this code example. I assume somewhere there's a conversion from byte[] to work with the int[] used by the Zxing library? — rcgldr, Jul 12 '17 at 22:31
If interest, I could port a subset of my RS demo code to Java to only include the encoder and one of the decoder algorithms, probably extended Euclid algorithm, since it also generates the error value polynomial. — rcgldr, Jul 13 '17 at 01:08
@rcgldr - Sorry, should have elaborated. 1) The Base64 bit doesn't have anything to do with the RS encoding/decoding. I was just converting my byte array to a string before passing it from the first function. 2) Yep, encodedData is 160 bytes. 3) I'm not sure, really. I basically worked around the wrapper I linked in the answer. And about the port, it would be great if you could do so! I was about to get to the error values, but decided to leave it for later. I would seriously appreciate it if you could port a subset. — doberoi96, Jul 13 '17 at 03:17

Porting MATLAB's Reed Solomon function to Java

2 Answers2