1

The question may be generic but I am trying to understand the major implications here.

I am trying to do some byte code engineering using BCEL library and part of the workflow requires me to read the same byte code file multiple times (from the beginning). The flow is the following

// 1. Get Input Stream

// 2. Do some work

// 3. Finish

// 4. Do some other work.

At step 4, I will need to reset the mark or get the stream as though it's from beginning. I know of the following choices.

1) Wrap the stream using BufferedInputStream - chance of getting "Resetting to invalid mark" IOException

2) Wrap it using ByteArrayInputStream - it always works even though some online research suggests that it's erroneous?

3) Simply call getInputStream() if I need to read from the stream again.

I am trying to understand which option would be better for me. I don't want to use BufferedInputStream because I have no clue where the last mark is called, so calling reset for a higher mark position will cause IOException. I would prefer using ByteArrayInputStream since it requires the minimum code change for me, but could anyone suggest whether option#2 or option#3 will be better?

I know that implementations for mark() and reset() are different for ByteArrayInputStream and BufferedInputStream in JDK.

Regards

ha9u63a7
  • 6,233
  • 16
  • 73
  • 108
  • so basically you are reading a `File` that turns to be a `.class` file? why not read it once and store that in a byte array for example? – Eugene Nov 20 '17 at 11:47
  • @Eugene Yes sir I am doing that already - it works but I am a believer or "Expert opinion" and hoping that there is a catch which I might not have considered. Do you know of anything? – ha9u63a7 Nov 20 '17 at 11:48
  • I only know that using mark and reset is important when you don't want to read your entire input stream; or you want to read the next couple of bytes in order to know what to do next; otherwise reading it into a array is the simplest (and clearest for me) – Eugene Nov 20 '17 at 11:50
  • @Eugene okay that helps. So you are saying that calling reset() for `ByteArrayInputStream` should correctly reset it back to position 0 (i mean, beginning) of the stream if I want to restart? i.e. there should not be any erroneous behaviour (because I haven't encountered anything so far ) ? – ha9u63a7 Nov 20 '17 at 11:52
  • 2
    well reset will reset it (if markSupported == true) to whatever mark was put, not zero, that's what you probably meant. But yes, if this is supported – Eugene Nov 20 '17 at 11:54

1 Answers1

4

The problem of mark/reset is not only that you have to know in advance the maximum amount of data being read between these calls, you also have to know whether the code you’re delegating to will use that feature for itself internally, rendering your mark obsolete. It’s impossible for code using mark/reset to remember and restore a previous mark for the caller.

So while it would be possible to fix the maximum issue by specifying the total file size as maximum readlimit, you can never rely on a working mark when passing the InputStream to an arbitrary library function that does not explicitly document to never use the mark/reset feature internally.

Also, a BufferedInputStream getting a readlimit matching the total file size would not be more efficient than a ByteArrayInputStream wrapping an array holding the entire file, as both end up maintaining a buffer of the same size.


The best solution would be to read the entire class file into an array once and directly use the array, e.g. for code under your control or when you have a choice regarding the library (ASM’s ClassReader supports using a byte array instead of an InputStream, for example).

If you have to feed an InputStream to a library function insisting on it, like BCEL, then wrap the byte array into a ByteArrayInputStream when needed, but create a new ByteArrayInputStream each time you have to re-parse the class file. Constructing the new ByteArrayInputStream costs nothing, as it is a lightweight wrapper and is reliable, as it does not depend on the state of an older input stream in any way. You could even have multiple ByteArrayInputStream instances reading the same array at the same time.

Calling getInputStream() again would be an option, if you have to deal with really large files for which buffering the entire contents is not an option, however, this is not the case for class files.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • I am actually wrapping the file contents (byte[]) into `ByteArrayInputStream` and passing that on - minimum change in API - this works for me. Looks like both yourself and @Eugene have got this in your comments. I am happy with this for now. – ha9u63a7 Nov 20 '17 at 12:02