I a lot of files in HDFS and want to copy them into sequence files by MR job. The key type of the seq file is TEXT (I use SHA1), and the value type is BytesWritable(the file content). I find some example code reads all the file content into a byte array, say buffer, then set the buffer to the ByteWritable object. Ex:
byte[] buffer = new byte[(int)file.length()];
FileInputStream fis = new FileInputStream(fileEntry);
int length = fis.read(buffer);
fis.close();
key.set(sha1);
value.set(buffer, 0, buffer.length);
writer.append(key, value);
My question is: If my input file is very big, the buffer size my exceed memory limit. Can I append to the ByteWritable object with a loop that writes smaller amount of data in each iteration? Or can I just assign a input stream to the BytesWritable object and let it handle the problem?
Thanks.