5

When constructing an InputStream on a RandomAccessFile in order to have Kryo deserialize objects from it, it seems it makes a HUGE difference for performance whether one constructs the mediating InputStream through the file's Channel (gives good performance) or through its FileDescriptor (gives terrible performance):

RandomAccessFile ra = new RandomAccessFile(dataFile, "r");

Input input1 = new InputWithRandomAccessFile(Channels.newInputStream(ra.getChannel()), FILE_BUF_SIZE, ra);
Input input2 = new InputWithRandomAccessFile(new FileInputStream(ra.getFD()), FILE_BUF_SIZE, ra);

InputWithRandomAccessFile is my own class extending Kryo's Input class, with the only additional behavior that it seeks the correct position in the R/A file when #setPosition is called.

Reading 3,000 fixed-size objects from input1 takes around 600 ms, from input2 it takes around 16 seconds.

Jimmy Jam
  • 71
  • 3
  • 1
    Most likely Kryo has been optimised for FileChannel and has to use some sort of fall back if you pass an InputStream. Can you try `new FileInputStream(ra.getFD()).getChannel()` – Peter Lawrey May 02 '14 at 10:15
  • Maybe strace and/or a Java profiler can give you clue. – Artefacto May 02 '14 at 10:17
  • 1
    `input2` opens a completely new, sequential access channel, while `Channel.newInputStream()` simply wraps the existing random access channel. This might be a factor. – biziclop May 02 '14 at 10:19
  • What happens if you wrap the `FileInputStream` in a `BufferedInputStream`? I realise your class takes a buffer argument but it's not clear how this is used... – monocell May 02 '14 at 10:21
  • @monocell -- The buffer arg goes to the super constructor: Kryo's input class will use this to create its own buffer, which is why a BufferedInputStream is not desirable when using Kryo in this way. Just to be sure I tried what you suggested: the result was the same, around 16 seconds. – Jimmy Jam May 02 '14 at 10:33
  • 1
    @PeterLawrey -- Kryo's Input constructor doesn't take a Channel, so what would I do with the Channel created from the stream as you suggest? I'd have to wrap it in yet another InputStream. [EDIT: I tried this anyway, and it *is* fast: 600 ms.] – Jimmy Jam May 02 '14 at 10:36
  • @biziclop -- I think you're right. The point seems to be that the stream must be constructed on an existing Channel for the "Channel I/O" to be used. Apparently, if the stream is constructed otherwise (even from an FD for a file on which a Channel is open), it uses a much slower I/O implementation. This also explains why PeterLawrey's suggestion _did_ work. – Jimmy Jam May 02 '14 at 10:42

0 Answers0