How can I read a specific number of bytes from a FileInputStream object using buffers

Question

I have a series of objects stored within a file concatenated as below:

sizeOfFile1 || file1 || sizeOfFile2 || file2 ...

The size of the files are serialized long objects and the files are just the raw bytes of the files.

I am trying to extract the files from the input file. Below is my code:

FileInputStream fileInputStream = new FileInputStream("C:\Test.tst");
ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream);
while (fileInputStream.available() > 0)
{
  long size = (long) objectInputStream.readObject();
  FileOutputStream fileOutputStream = new FileOutputStream("C:\" + size + ".tst");
  BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
  int chunkSize = 256;
  final byte[] temp = new byte[chunkSize];
  int finalChunkSize = (int) (size % chunkSize);
  final byte[] finalTemp = new byte[finalChunkSize];
  while(fileInputStream.available() > 0 && size > 0)
  {
    if (fileInputStream.available() > finalChunkSize)
    {
      int i = fileInputStream.read(temp);
      secBufferedOutputStream.write(temp, 0, i);
      size = size - i;
    }
    else
    {
      int i = fileInputStream.read(finalTemp);
      secBufferedOutputStream.write(finalTemp, 0, i);
      size = 0;
    }
  }
  bufferedOutputStream.close();
}
fileOutputStream.close();

My code fails after it reads the first sizeOfFile; it just reads the rest of the input file into one file when there are multiple files stored.

Can anyone see the issue here?

Regards.

Is it compiling? `"C:\" + size + ".tst"` is invalid String - should be `"C:\\" + size + ".tst"` — MGorgon, Dec 21 '13 at 00:08
Sorry, I made a mistake here when copying the code out of my project into the box. It does compile and run. I have detailed my error in the last line of my question. — Danny Rancher, Dec 21 '13 at 00:09
You should really think about using either some kind of compressing output stream (GZipOutputStream, ZipOutputStream), or avro, or thrift. Also, you need to have the close statements in a finally block. — msknapp, Dec 21 '13 at 00:17
Ugh, you should not use serialized longs, you are wasting lot of space. I would use [`readLong()`](http://docs.oracle.com/javase/7/docs/api/java/io/DataInputStream.html#readLong%28%29) from DataInputStream and readFully() (as seen in answer below). — eckes, Oct 09 '14 at 00:29
You should not read from fileInputStream if you have a ObjectInputStream on top of it (as it pre-reads into the buffer). — eckes, Oct 09 '14 at 00:32

score 1 · Answer 1 · edited Oct 09 '14 at 00:30

1

Wrap it in a DataInputStream and use readFully(byte[]).

But I question the design. Serialization and random access do not mix. It sounds like you should be using a database.

NB you are misusing available(). See the method's Javadoc page. It is never correct to use it as a count of the total number of bytes in the stream. There are few if any correct uses of available(), and this isn't one of them.

edited Oct 09 '14 at 00:30

Colonel Thirty Two

23,953
8
45
85

answered Dec 21 '13 at 01:04

user207421

305,947
44
307
483

I added javadoc links to your text. and +100 to available usage. – eckes Oct 09 '14 at 00:26

score 0 · Answer 2 · answered Dec 21 '13 at 00:17

0

you could try NIO instead...

FileChannel roChannel = new RandomAccessFile(file, "r").getChannel();
ByteBuffer roBuf = roChannel.map(FileChannel.MapMode.READ_ONLY, 0, SIZE);

This reads only SIZE bytes from the file.

B

answered Dec 21 '13 at 00:17

Software Engineer

15,457
7
74
102

Thank you. How can I start and stop reading bytes within the file at particular points though. – Danny Rancher Dec 21 '13 at 00:27
With this extend for large files? I'm reading files like 1.5GB in size. – Danny Rancher Dec 21 '13 at 00:32
1

Yeah, no issue with that. NIO is, partially, designed for this. – Software Engineer Dec 21 '13 at 00:33
I'm working with a file which reports 1460750276 from roChannel.size(). The map is failing. what are your thoughts? – Danny Rancher Dec 30 '13 at 18:38

score 0 · Answer 3 · answered Oct 09 '14 at 00:38

This is using DataInput to read longs. In this particular case I am not using readFully() as a segment might be too long to keep it in memory:

DataInputStream in = new DataInputStream(FileInputStream());
byte[] buf = new byte[64*1024];
while(true) {
  OutputStream out = ...;
  long size;
  try { size = in.readLong(); } catch (EOFException e) { break; } 
  while(size > 0) {
    int len = (size > buf.length)?buf.length:size;
    len = in.read(buf, 0, len);
    out.write(buf, 0, len);
    size-=len;
  }
  out.close();
}

score -1 · Answer 4 · answered Dec 21 '13 at 00:35

Save yourself a lot of trouble by doing one of these things:

Switch to using Avro, trust me you would be crazy not to. It's easy to learn, and will accomodate schema changes. Using ObjectXXXStream is one of the worst ideas ever, as soon as you change your schema your old files are garbage.
or use Thrift
or use Hibernate (but this is probably not a great option, hibernate takes a lot of time to learn, and takes a lot of configuration)

If you really refuse to switch to avro, I recommend reading up on apache's IOUtils class. It has a method to copy from one input stream to another, saving you a lot of headaches. Unfortunately what you want to do is a little more complicated, you want the size prefixing each file. You might be able to use a combination of SequenceInputStream objects to do that.

There is also GzipOutputStream and ZipOutputStream, but I think those require some other jars added to your classpath too.

I'm not going to write an example because I honestly think you should just learn avro or thrift and use that.

How can I read a specific number of bytes from a FileInputStream object using buffers

4 Answers4