1

I have 10000 to 12000 image files and having space up to 800 MB present in external storage.

I am using a loop which takes each file path and generates md5 of it, but due to huge amount of files being read to create md5, this takes alot of time.

This is the algorithm for generating md5 of file.

public static String getMd5OfFile(String filePath) {

    String returnVal = "";

    try {

        InputStream input = new FileInputStream(filePath);

        // byte[] buffer  = new byte[1024];
        byte[] buffer = new byte[2048];

        MessageDigest md5Hash = MessageDigest.getInstance("MD5");

        int numRead = 0;
        while (numRead != -1) {
            numRead = input.read(buffer);
            if (numRead > 0) {
                md5Hash.update(buffer, 0, numRead);
            }
        }

        input.close();

        byte[] md5Bytes = md5Hash.digest();

        for (int i = 0; i < md5Bytes.length; i++) {
            returnVal += Integer.toString((md5Bytes[i] & 0xff) + 0x100, 16).substring(1);
        }                
    } catch (Throwable t) {
        t.printStackTrace();
    }

    return returnVal.toUpperCase();
}

So the question is can i increase the buffer size to make operation faster and by how much should i do it, which would not either break the operation or create an issue for generation of md5.

And does wrap the buffer stream in input stream will make it faster?

Ajay Mehta
  • 843
  • 5
  • 14
dan walker
  • 31
  • 8

1 Answers1

0

As with any optimisation problems, you should measure your performance to learn if any of the changes you make have impact.

2k is certainly a small buffer size and a larger one could do better. But I/O stacks have buffers all the way down, so it might have negligible impact. Try and measure yourself.

Another optimisation worth trying out is to notice that reading a file is an I/O-bound operation and computing MD5 is CPU-bound. Have one thread read file content and another thread just update MD5 state. Depending on the number of CPU cores on your device, you could hash multiple files in parallel with performance gains.

laalto
  • 150,114
  • 66
  • 286
  • 303
  • i am doing this in android app , i did not understand totally but got the point that file stream are better, md5 and file operation should be performed in different threads, but still any code will help more. – dan walker May 03 '19 at 10:26
  • i am doing this in async task for more than 5000 images with size 890 MB while UI thread shows loading , still method is dependent on number of files.It takes about 30 to 40 seconds. Have changed byte buffer to (1024 * 12) = 12288 which was better and increasing more buffer was lagging so 12288 was best. Still need to consume less time but how ? – dan walker May 03 '19 at 13:20