0

The short question:

Why does the calculation of the md5-sum of a 5 MB file in Java take 84 seconds on a Raspberry Pi, while a Mac needs only 25 ms?

The whole question:

I need to write a Java program, which calculates the md5- or sha-sum of a bunch of files, which have together the size of about 50 GB.

For this purpose I wrote a simple Java program, which calculates the checksum of a single 5 MB file. This is the Java program:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;

public class Main {

    public static final int BLOCKSIZE = 8*1024;

    public static void main(String[] args) throws FileNotFoundException, NoSuchAlgorithmException{
        String path = Main.class.getResource("file5M.img").getPath();
        File file = new File(path);
        FileInputStream fin = new FileInputStream(file);
        MessageDigest messageDigest = MessageDigest.getInstance("MD5");

        long fileSize = file.length();
        int length;
        long alreadyRead = 0;
        long startTime = System.currentTimeMillis();
        byte[] bytes = new byte[BLOCKSIZE];
        try {
            while (true) {
                int maxToRead = (int) (fileSize - alreadyRead < BLOCKSIZE ? fileSize - alreadyRead : BLOCKSIZE);

                if ((length = fin.read(bytes, 0, maxToRead)) < 0) break;
                messageDigest.update(bytes, 0, length);
                if ((alreadyRead += length) >= fileSize) break;
            }
        } catch (IOException ex){
            ex.printStackTrace();
        }
        byte[] md5 = messageDigest.digest();
        long stopTime = System.currentTimeMillis();
        long elapsedTime = stopTime - startTime;
        System.out.println("Time:\t" + elapsedTime + "\tRead:\t" + alreadyRead/1024/1024);
        System.out.println("MD5: " + Arrays.toString(md5));
    }
}

For creating a random fileimage, I used this Linux command:

dd if=/dev/urandom of=file5M.txt bs=1M count=5

Executing the program on different devices, lead to confusing results:

 <table style="width:100%">
  <tr>
    <th>Time in ms</th>
    <th>Computer</th>
    <th>CPU</th>
    <th>RAM</th>
    <th>Harddrive</th>
    <th>Operating-System</th>
  </tr>
  <tr>
    <td>24</td>
    <td>MacBook Pro (13-inch, 2016)</td>
    <td>3.3 GHz Intel Core i7</td>
    <td>8 GB 2133 MHz LPDDR3</td>
    <td>APPLE SSD AP1024J</td>
    <td>MacOs Sierra</td>
  </tr>
  <tr>
    <td>45000</td>
    <td>Raspberry Pi Modell B</td>
    <td>0.7 GHz ARMv6 (32-bit)</td>
    <td>256 MB</td>
    <td>PRO microSD Card (SD Adapter)</td>
    <td>Arch Linux</td>
  </tr>
  <tr>
    <td>7600</td>
    <td>Odroid XU4</td>
    <td>Exynos5 Octa Cortex™-A15 1.6Ghz quad core and Cortex™-A7 quad core CPUs</td>
    <td>2Gbyte LPDDR3 RAM PoP</td>
    <td>Samsung PRO microSD Card (SD Adapter)</td>
    <td>Arch Linux for Odroid-XU3</td>
  </tr>
  <tr>
    <td>300</td>
    <td>VirtualBox on MacBook Pro</td>
    <td>1 Core with 0.7GHz (21% of MacCPU) no PAE/NX, no acceleration</td>
    <td>256MB of MacRAM PIIX3 with APIC</td>
    <td>Dynamic Allocated 8GB (VDI)</td>
    <td>Arch Linux 64-Bit</td>
  </tr>
</table>

So why is the execution of the program so much faster on the MacBook, even if I restrict the CPU and the RAM in VirtualBox?

Where can the bottleneck be?

What do I have to do to make the program execute in about 300 ms on the Odroid-XU4?

Remarks:

I don't think it is the I/O of the microSD, because it reads the whole file very fast without calculating the md5sum.

Changing the cpu-frequency from 2Ghz to 500MHz on the odroid, increased computation time from 7 to 24 seconds.

Harald
  • 526
  • 4
  • 26
  • Why does your read-loop look like that? – Kayaman Apr 16 '17 at 15:23
  • @Kayaman because the files will be transfered over a socket, which doesn't have an EOF after the file transfer is complete. But I know the size of the file. – Harald Apr 16 '17 at 15:30
  • Wha? So you're showing code that uses a `FileInputStream`, but the actual code reads the file from a socket? Why show code you're not running? – Kayaman Apr 16 '17 at 15:33
  • Because the benchmarks I describe were measured with this code. I don't think waiting for the EOF, will make the difference. However you are right, that I could have had simplified the code further. – Harald Apr 16 '17 at 15:43
  • It's just that every time someone uses a non-standard read-loop, it casts a doubt on the whole code, even if it wouldn't make a difference here. – Kayaman Apr 16 '17 at 15:56
  • Yeah true. Howevery I appriciate every idea, why this measures look like this. Even if it would be an bottleneck in the shown-code – Harald Apr 16 '17 at 16:09

1 Answers1

1

The Raspberry Pi has a much lower RAM frequency than the MacBook. It's probably why it runs faster even in VirtualBox. Because when you read a file, it will be stored in RAM, and even if it's very fast, you have I/O access every time you read the file and sum it with MD5 algorithm.

Moreover, if you want to improve the performance, I suggest you to use threads in your program (dispatch the files between the threads). Note that the threads are useless if you only have one core like on your VM.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
ShellCode
  • 1,072
  • 8
  • 17
  • Sounds logic, but does this fact also apply to the odroid-XU4. I mean the difference of 7000ms on him and 20ms on the mac is extreme. And shouldn't the file fit easly in the 2GB on board Ram? Is the LPDDR3 RAM of Odroid really that lame? – Harald Apr 16 '17 at 15:37
  • As you can see here : http://www.hardkernel.com/main/products/prdt_info.php?g_code=G143452239825&tab_idx=2 the frequency of the odroid is about 900MHz which is much lower than the 2xxx of the macbook – ShellCode Apr 16 '17 at 15:42
  • Ok nice answer :) However this looks like the RAM of the Ordoid is about 3 times slower than the mac. But why is the result not about 20ms*3? Or at least 0.5 Seconds? – Harald Apr 16 '17 at 15:47
  • I don't think it's linear, it's not that simple, you can't predict how many time it will take just by comparing the ram frequencies :) – ShellCode Apr 16 '17 at 15:56
  • I see your point. Can we somehow proof, it that it is really the ram frequenzy? I mean maybe I can simulate it in Virtualbox or we could calculate a speedup formular by the ram frequenzy of different devices which would proof your theory... – Harald Apr 16 '17 at 16:05
  • Try to overclock (downclock works too) your RAM frequency, and compare your results – ShellCode Apr 16 '17 at 16:10
  • Good idea. I will do it after lunch or so. This could take some time. Pls stay tuned, I will reply when I have the results ;) – Harald Apr 16 '17 at 16:14
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/141907/discussion-between-user3694044-and-shellcode). – Harald Apr 17 '17 at 17:20