Appending to SequenceFiles seems to be very slow. We're converting folders (with small files in it) to SequenceFiles using the filename as the key and the contents as the value. However, the throughput is quite low with about 2MB/s (about 2 to 3 files per second). We have Mio. of small files and at max 3 files per second is incredibly slow for our purposes.
What we're doing is a simple:
for(String file : files) {
byte[] data = Files.readAllBytes(Paths.get(dir.getAbsolutePath()
+ File.separatorChar + file));
byte[] keyBytes = l.getBytes("UTF-8");
BytesWritable key = new BytesWritable(keyBytes);
BytesWritable val = new BytesWritable(data);
seqWriter.append(key, val);
}
Any hints, ideas on how to speed things up?