I need to read a binary file in java and split it up (its actually a binary file containing many pdf files, with a single line "metadata" before each).
Each pdf item from the binary file ends with a "%%EOF"
marker.
My first attempt, I read the file line by line as a UTF-8 file, but this corrupted the binary data!!
reader = new BufferedReader(new InputStreamReader(new FileInputStream(binaryFile), "UTF-8"));
String mdmeta;
while ((mdmeta = reader.readLine()) != null) {
System.out.println("read file metadata: " + mdmeta);
writeToFile("exploded-file-123");
}
and method writeToFile
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(fullFilename), "UTF-8"));
writer.write("%PDF-1.4\r\n");
String line;
while ((line = reader.readLine()) != null) {
writer.write(line);
writer.write("\r\n");
if ("%%EOF".equals(line)) {
writer.flush();
return;
}
}
... although this splits up the file into exploded items, those binary files are corrupt (certainly because I read and wrote the bytes as UTF-8
strings...)
I think I need a more low level approach, using InputStream's.
It gets complicated since the files can be large. Imagine I use a buffer... I can read bytes from the file to fill the buffer... then I need to look for the "%%EOF"
inside the buffer... and manually split the buffer between the previous exploded item and the next one.
Or if "%%EOF"
falls on the buffer edge then I might miss the file boundary completely...
I guess I'm looking for some sort of way to readBytesUpUntil("%%EOF")
- is there an easy way to do this?