0

For a project I need to implement a function in Android (with java) which generates a CBC-MAC (AES) from a file. So basically the function takes different 'blocks' from the file and calculates an identifier for every block and finally combines it to an identifier for the whole file.

The function works great, however, for bigger files it is extremely slow (could take minutes to hours) because of the loops implemented. However, my knowledge on cryptography doesn't go very far so I'm not sure how to improve the speed or if it is even possible. The output gives exactly the same CBC-MAC as other libraries in different programming languages do, so it works ok.

Unfortunately I'm quite limited in using external libraries.. though the class CBCBlockCipherMac from bouncycastle is possible since I was able to include it with only a few dependencies but never got it to give the same output as the below mentioned function.

All feedback is welcome, I've been trying to solve it for 3 days now but can't figure it out. Thanks!

*Update It seems like that the function str_to_a32 in the for loop (looping over every 16 bytes) is causing the biggest speed problem. So if that function could be made faster it would solve the problem mainly. Also, unfortunately the looping over every 16 bytes is necessary since I'm implementing the same CBC-MAC function that cloud provider Mega also has implemented.

The code

        //TEST IMPLEMENTATION

    String _path_to_file = "";

    Random _random = new Random();
    long[] _key_file = new long[4];
    _key_file[0] = _random.nextInt(Integer.MAX_VALUE);
    _key_file[1] = _random.nextInt(Integer.MAX_VALUE);
    _key_file[2] = _random.nextInt(Integer.MAX_VALUE);
    _key_file[3] = _random.nextInt(Integer.MAX_VALUE);

    long[] _iv_file = new long[4];
    _iv_file[0] = _random.nextInt(Integer.MAX_VALUE);
    _iv_file[1] = _random.nextInt(Integer.MAX_VALUE);
    _iv_file[2] = 0;
    _iv_file[3] = 0;

    long[] _returned = cbc_mac(_path_to_file, _key_file, _iv_file);


//FUNCTIONS

//this function loops over the parts of the file to calculate the cbc-mac and is the problem
public static long[] cbc_mac(String _path, long[] k, long[] n) throws Exception {
    File _file = new File(_path);
    long _file_length = _file.length();
    RandomAccessFile _raf = new RandomAccessFile(_file, "r");

    //This works fine and fast
    ArrayList<chunksData> chunks = get_chunks(_file_length);

    long[] file_mac = new long[4];
    file_mac[0] = 0;
    file_mac[1] = 0;
    file_mac[2] = 0;
    file_mac[3] = 0;

    //prepare encrypt
    String iv = "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0";
    IvParameterSpec ivSpec = new IvParameterSpec(iv.getBytes());
    SecretKeySpec keySpec = new SecretKeySpec(a32_to_str(k).getBytes("ISO-8859-1"), "AES");
    Cipher cipher = Cipher.getInstance("AES/CBC/NOPADDING");
    cipher.init(Cipher.ENCRYPT_MODE, keySpec, ivSpec);
    //end prepare encrypt

    for(chunksData _chunksData : chunks) {

        int pos = (int)_chunksData._key;
        int size = (int)_chunksData._value;

        long[] chunk_mac = new long[4];
        chunk_mac[0] = n[0];
        chunk_mac[1] = n[1];
        chunk_mac[2] = n[0];
        chunk_mac[3] = n[1];

        byte[] bytes = new byte[16];

        //this loop is the really slow part since it loops over every 16 bytes
        for (int i = pos; i < pos + size; i += 16) {
            _raf.seek(i);
            int _did_read = _raf.read(bytes, 0, 16);
            if(_did_read != 16) {
                for(int o = _did_read;o<16;o++) {
                    bytes[o] = (byte)((char)'\0');
                }
            }

            long[] block = str_to_a32(new String(bytes, "ISO-8859-1"));

            chunk_mac[0] = chunk_mac[0] ^ block[0];
            chunk_mac[1] = chunk_mac[1] ^ block[1];
            chunk_mac[2] = chunk_mac[2] ^ block[2];
            chunk_mac[3] = chunk_mac[3] ^ block[3];

            chunk_mac = str_to_a32(new String(cipher.doFinal(a32_to_str(chunk_mac).getBytes("ISO-8859-1")), "ISO-8859-1"));

        }

        file_mac[0] = file_mac[0] ^ chunk_mac[0];
        file_mac[1] = file_mac[1] ^ chunk_mac[1];
        file_mac[2] = file_mac[2] ^ chunk_mac[2];
        file_mac[3] = file_mac[3] ^ chunk_mac[3];
        file_mac = str_to_a32(new String(cipher.doFinal(a32_to_str(file_mac).getBytes("ISO-8859-1")), "ISO-8859-1"));

    }

    _raf.close();

    return file_mac;

}

//this function works fine and fast
public static ArrayList<chunksData> get_chunks(long size) {

    ArrayList<chunksData> chunks = new ArrayList<chunksData>();

    long p = 0;
    long pp = 0;

    for (int i = 1; i <= 8 && p < size - i * 0x20000; i++) {
        chunksData chunks_temp = new chunksData(p, i*0x20000);
        chunks.add(chunks_temp);
        pp = p;
        p += chunks_temp._value;
    }

    while(p < size) {
        chunksData chunks_temp = new chunksData(p, 0x100000);
        chunks.add(chunks_temp);
        pp = p;
        p += chunks_temp._value;            
    }

    chunks.get(chunks.size()-1)._value = size-pp;
    if((int)chunks.get(chunks.size()-1)._value == 0) {
        chunks.remove(chunks.size()-1);
    }

    return chunks;

}

public static class chunksData {
    public long _key = 0;
    public long _value = 0;
    public chunksData(long _keyT, long _valueT){
        this._key = _keyT;
        this._value = _valueT;
    }
}

//helper function which also contains a loop and is used in the problematic loop, so might be a problem though I don't know how to speed it up
public static long[] str_to_a32(String string) {
    if (string.length() % 4 != 0) {
        string += new String(new char[4 - string.length() % 4]);
    }
    long[] data = new long[string.length() / 4];

    byte[] part = new byte[8];
    for (int k = 0, i = 0; i < string.length(); i += 4, k++) {
        String sequence = string.substring(i, i + 4);
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        try {
            baos.write(sequence.getBytes("ISO-8859-1"));
            System.arraycopy(baos.toByteArray(), 0, part, 4, 4);
            ByteBuffer bb = ByteBuffer.wrap(part);
            data[k] = bb.getLong();
        } catch (IOException e) {
            data[k] = 0;
        }
    }
    return data;
}

//helper function which also contains a loop and is used in the problematic loop, so might be a problem though I don't know how to speed it up
public static String a32_to_str(long[] data) {
    byte[] part = null;
    StringBuilder builder = new StringBuilder();
    ByteBuffer bb = ByteBuffer.allocate(8);
    for (int i = 0; i < data.length; i++) {
        bb.putLong(data[i]);
        part = copyOfRange(bb.array(), 4, 8);
        bb.clear();
        ByteArrayInputStream bais = new ByteArrayInputStream(part);
        while (bais.available() > 0) {
            builder.append((char) bais.read());
        }
    }
    return builder.toString();
}
  • You are calculating a MAC for each separate block of the file. Why not just do a single MAC for the entire file? Is the file ever handled as a set of blocks or is it always treated as a single entity? Just encrypt the file in AES-CBC with a different key and a zero IV. The last block of the CBC encryption is you MAC for the whole file. – rossum Mar 20 '15 at 13:24
  • @rossum Thanks for the idea, however, to be a bit more specific I'm implementing the CBC-MAC function that cloud provider Mega uses. And unfortunately Mega loops over every 16 bytes to calculate the mac of the whole file... It seems that the function str_to_a32 is taking the most time, so if that could be made faster it would be great but I have no idea how. – Maarten Smit Mar 22 '15 at 14:41

1 Answers1

0

My main suspect is the seek operation in your first loop and processing only 16 bytes. I don't know the algorithm but your code suggest that reading full "chunk" is possible and then you can process is it in parts are necessary.

Also, the chunks seems to be sequential (unless I miss somehting) so whole reading could be done sequentially without the seek.

You don't need the ByteArrayOutput stream in your helper method. Also making substring has impact, so calling toBytes on the whole string and then picking up the parts of the byte array will be more efficient.

The code below is roughly two times faster than original.

public long[] fast_str_to_a32(String string) throws UnsupportedEncodingException {
    if (string.length() % 4 != 0) {
        string += new String(new char[4 - string.length() % 4]);
    }
    long[] data = new long[string.length() / 4];

    byte[] bytes = string.getBytes("ISO-8859-1");

    byte[] part = new byte[8];
    ByteBuffer bb = ByteBuffer.wrap(part); 
    for (int k = 0, i = 0; i < bytes.length; i += 4, k++) {
        System.arraycopy(bytes, i, part, 4, 4);
        bb.rewind();
        data[k] = bb.getLong();
    }
    return data;
}

Also in the main method you convert the bytes to string only to convert them back to byte[] at the begining of str_to_a32, you should just use byte[] as this method input.

I still believe that you should read the whole chunk at once, and then process it in blocs of 16 bytes.

There is potentially a problem in your code: you try to read 16 bytes but if you get less you start padding. However, contract for read is "An attempt is made to read as many as len bytes, but a smaller number may be read." Typically the smaller number happens at the end of the file, but it principle it may happen any time. If so you will start padding in the middle of the stream and mess up your parts completely.

Zielu
  • 8,312
  • 4
  • 28
  • 41
  • Thanks for the comment, really appreciated! Unfortunately I forgot to add some more information, because I did try to read the whole file and process that (loop over every 16 bytes is still necessary), however that didn't speed much up. I also tried a normal fileinputstream instead of the seek but doesn't help either. Also tried removing the bytearrayoutput stream and again, doesn't help much. It does however look like that the function str_to_a32 is costing the most time, so if that could be made faster some way would solve the problem. – Maarten Smit Mar 22 '15 at 14:44
  • I cleaned your str_to_a32 method, it runs now 2 times faster (depending on the imput lenght). You can do the same for a32_to_str. As I said, you shoudl work with longer buffers (arrays) in your code and pass apropriete indexes instead of creating and copying byte chunks. – Zielu Mar 22 '15 at 15:53
  • You're cleaning up made it a lot faster. I applied the same to the a32_to_str and added an extra function which takes bytes as input to fix the conversion from bytes->string->bytes. Thanks! – Maarten Smit Mar 23 '15 at 00:09