3

I noticed when I use readFully() on a file instead of the read(byte[]), processing time is reduced greatly. However, it occured to me that readFully may be a double edged sword. If I accidentlly try to read in a huge, multi-gigabyte file, it could choke?

Here is a function I am using to generate an SHA-256 checksum:

public static byte[] createChecksum(File log, String type) throws Exception {
    DataInputStream fis = new DataInputStream(new FileInputStream(log));
    Long len = log.length();
    byte[] buffer = new byte[len.intValue()];
    fis.readFully(buffer); // TODO: readFully may come at the risk of
                            // choking on a huge file.
    fis.close();
    MessageDigest complete = MessageDigest.getInstance(type);
    complete.update(buffer);
    return complete.digest();
}

If I were to instead use:

DataInputStream fis = new DataInputStream(new BufferedInputStream(new FileInputStream(log)));

Would that allieviate this risk? Or... is the best option (in situations where you can't garuntee data size) to always control the amount of bytes read in and use a loop till all bytes are read?

(Come to think of it, since the MessageDigest API takes in the full byte array at once, I'm not sure how to attain a checksum without stuffing all the data in at once, but I suppose that is another question for another thread.

E.S.
  • 2,733
  • 6
  • 36
  • 71
  • 1
    The `update()` method, which you are using, doesn't require all of the data. You can invoke it multiple times per digest. – erickson Jun 20 '13 at 00:46

3 Answers3

4

You should just allocate a decently-sized buffer (65536 bytes perhaps), and do a loop where you read 64kb at a time, using "complete.update()" to append to the digester inside the loop. Be careful on the last block so you only process the number of bytes read (probably less than 64kb)

faffaffaff
  • 3,429
  • 16
  • 27
2

Reading the file will take as long as it takes, whether you use readFully() or not.

Whether you can actually allocate gigabyte-sized byte arrays is another question. There is no need to use readFully() at all when downloading files. It's for use in wire protocols where say the next 12 bytes are an identifier followed by another 60 bytes of address information and you don't want to have to keep writing loops.

user207421
  • 305,947
  • 44
  • 307
  • 483
1

readFully() isn't going to choke if the file is multiple gigabytes, but allocating that byte buffer will. You'll get an out-of-memory exception before you ever get to the call to readFully().

You need to use the method of updating the hash with chunks of the file repeatedly, rather than updating it all at once with the entire file.

Idles
  • 1,131
  • 5
  • 9