0

I have a series of objects stored within a file concatenated as below:

sizeOfFile1 || file1 || sizeOfFile2 || file2 ...

The size of the files are serialized long objects and the files are just the raw bytes of the files.

I am trying to extract the files from the input file. Below is my code:

FileInputStream fileInputStream = new FileInputStream("C:\Test.tst");
ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream);
while (fileInputStream.available() > 0)
{
  long size = (long) objectInputStream.readObject();
  FileOutputStream fileOutputStream = new FileOutputStream("C:\" + size + ".tst");
  BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
  int chunkSize = 256;
  final byte[] temp = new byte[chunkSize];
  int finalChunkSize = (int) (size % chunkSize);
  final byte[] finalTemp = new byte[finalChunkSize];
  while(fileInputStream.available() > 0 && size > 0)
  {
    if (fileInputStream.available() > finalChunkSize)
    {
      int i = fileInputStream.read(temp);
      secBufferedOutputStream.write(temp, 0, i);
      size = size - i;
    }
    else
    {
      int i = fileInputStream.read(finalTemp);
      secBufferedOutputStream.write(finalTemp, 0, i);
      size = 0;
    }
  }
  bufferedOutputStream.close();
}
fileOutputStream.close();

My code fails after it reads the first sizeOfFile; it just reads the rest of the input file into one file when there are multiple files stored.

Can anyone see the issue here?

Regards.

Danny Rancher
  • 1,923
  • 3
  • 24
  • 43
  • Is it compiling? `"C:\" + size + ".tst"` is invalid String - should be `"C:\\" + size + ".tst"` – MGorgon Dec 21 '13 at 00:08
  • Sorry, I made a mistake here when copying the code out of my project into the box. It does compile and run. I have detailed my error in the last line of my question. – Danny Rancher Dec 21 '13 at 00:09
  • You should really think about using either some kind of compressing output stream (GZipOutputStream, ZipOutputStream), or avro, or thrift. Also, you need to have the close statements in a finally block. – msknapp Dec 21 '13 at 00:17
  • Ugh, you should not use serialized longs, you are wasting lot of space. I would use [`readLong()`](http://docs.oracle.com/javase/7/docs/api/java/io/DataInputStream.html#readLong%28%29) from DataInputStream and readFully() (as seen in answer below). – eckes Oct 09 '14 at 00:29
  • You should not read from fileInputStream if you have a ObjectInputStream on top of it (as it pre-reads into the buffer). – eckes Oct 09 '14 at 00:32

4 Answers4

1

Wrap it in a DataInputStream and use readFully(byte[]).

But I question the design. Serialization and random access do not mix. It sounds like you should be using a database.

NB you are misusing available(). See the method's Javadoc page. It is never correct to use it as a count of the total number of bytes in the stream. There are few if any correct uses of available(), and this isn't one of them.

Colonel Thirty Two
  • 23,953
  • 8
  • 45
  • 85
user207421
  • 305,947
  • 44
  • 307
  • 483
0

you could try NIO instead...

FileChannel roChannel = new RandomAccessFile(file, "r").getChannel();
ByteBuffer roBuf = roChannel.map(FileChannel.MapMode.READ_ONLY, 0, SIZE);

This reads only SIZE bytes from the file.

B

Software Engineer
  • 15,457
  • 7
  • 74
  • 102
0

This is using DataInput to read longs. In this particular case I am not using readFully() as a segment might be too long to keep it in memory:

DataInputStream in = new DataInputStream(FileInputStream());
byte[] buf = new byte[64*1024];
while(true) {
  OutputStream out = ...;
  long size;
  try { size = in.readLong(); } catch (EOFException e) { break; } 
  while(size > 0) {
    int len = (size > buf.length)?buf.length:size;
    len = in.read(buf, 0, len);
    out.write(buf, 0, len);
    size-=len;
  }
  out.close();
}
eckes
  • 10,103
  • 1
  • 59
  • 71
-1

Save yourself a lot of trouble by doing one of these things:

  1. Switch to using Avro, trust me you would be crazy not to. It's easy to learn, and will accomodate schema changes. Using ObjectXXXStream is one of the worst ideas ever, as soon as you change your schema your old files are garbage.
  2. or use Thrift
  3. or use Hibernate (but this is probably not a great option, hibernate takes a lot of time to learn, and takes a lot of configuration)

If you really refuse to switch to avro, I recommend reading up on apache's IOUtils class. It has a method to copy from one input stream to another, saving you a lot of headaches. Unfortunately what you want to do is a little more complicated, you want the size prefixing each file. You might be able to use a combination of SequenceInputStream objects to do that.

There is also GzipOutputStream and ZipOutputStream, but I think those require some other jars added to your classpath too.

I'm not going to write an example because I honestly think you should just learn avro or thrift and use that.

msknapp
  • 1,595
  • 7
  • 22
  • 39