0

I'm having a problem with a new file format I'm being asked to implement at work.

Basically, the file is a text file which contains a bunch of headers containing information about the data in UTC-8, and then the rest of the file is the numerical data in binary. I can write the data and read it back just fine, and I recently added the code to write the headers. The problem is that I don't know how to read a file that contains both text and binary data. I want to be able to read in and deal with the header information (which is fairly extensive) and then be able to continue reading the binary data without having to re-iterate through the headers. Is this possible?

I am currently using a FileInputStream to read the binary data, but I don't know how to start it at the beginning of the data, rather than the beginning of the whole file. One of the FileInputStream's constructors takes a FileDescriptor as the argument and I think that's my answer, but I don't know how to get one from another file reading class. Am I approaching this correctly?

DementedDr
  • 177
  • 2
  • 15

1 Answers1

0

You can reposition a FileInputStream to any arbitrary point by getting its channel via getChannel() and calling position() on that channel.

The one caveat is that this position affects all consumers of the stream. It is not suitable if you have different threads (for example) reading from different parts of the same file. In that case, create a separate FileInputStream for each consumer.

Also, this technique only works for file streams, because the underlying file can be randomly accessed. There is no equivalent for sockets, or named pipes, or anything else that is actually a stream.

parsifal
  • 1,246
  • 6
  • 8
  • 1
    It doesn't sound to me as if he knows the position, to which he must seek before he can read the binary data. – jarnbjo Mar 20 '13 at 18:50
  • @jarnbjo - Based on the end of his first paragraph it seems that he does know where the data begins. But it's entirely possible that he's using something like a `BufferedReader` to process the text and so the underlying file position is invalid. Perhaps if/when he returns we'll get clarification. – parsifal Mar 20 '13 at 18:59
  • AFAIK, it is not possible with the standard API to wrap an InputStream in a Reader and alternately read characters from the Reader and binary data from the InputStream. Even a "plain" InputStreamReader may consume more raw data from the underlying stream than is actually converted to characters and returned by the read methods. – jarnbjo Mar 20 '13 at 19:09
  • @jarnbjo - You're correct, but there are alternatives. For example, you can write a stream decorator that reads until it hits a newline and then converts the bytes to a string. – parsifal Mar 20 '13 at 20:07
  • If you only have to support a fixed encoding like UTF-8 and the text part consists of one or more text lines (ending with a newline), that is of course an option. Supporting other encodings or having a more complex delimiter between the text and the binary data makes that actually much more complex than it may sound. – jarnbjo Mar 20 '13 at 23:23
  • @jarnbjo - Any problem can be made infinitely complex. In my experience, most aren't. However, since the OP clearly shows no interest in clarifying his question, I don't plan to invest any more thought in it. – parsifal Mar 21 '13 at 13:00
  • Sorry for the long reply, I was really really busy over the weekend. I know the format of the header, but there is a section that I don't know the length of, so I don't know the exact start of the data. – DementedDr Mar 25 '13 at 18:11