0

I was trying to analyse a recently-found issue in my application and realized that my inputStream.reset() method fails because I was trying to operate on FileInputStream.

It seems that my method call For apache's DiskFile.getInputStram() returns ByteArrayInputStream instance (mark supported) or FileInputStream (mark NOT supported) instance based on certain file size threshold.

The code I have to get this inputstream is:

FormFile file = multipartForm.getFiles().get(0); // It's always one file
InputStream is = file.getInputStream();

// Read the stream and did job
// Now I want to reset it.
// bad coding from my side because I didn't check markSupported

is.reset();

// Got IO error immediately after this. But anything below 256KB is ok

I am sure this is mentioned/explained somewhere in Oracle JDK docs or apache's site. but cannot seem to remember any references. Does anyone know if this behaviour makes sense?

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
ha9u63a7
  • 6,233
  • 16
  • 73
  • 108
  • so `DiskFile.getInputStram()` returns an `InputStream` and the underlying implementation of that is either `ByteArrayInputStream` or `FileInputStream ` depending on the size of the File? – Eugene Nov 17 '17 at 09:31
  • yes - and if my tests are correct then anything above 256KB on Windows will return FileInputStream implementation, but lower size will be ByteArrayInputStream. – ha9u63a7 Nov 17 '17 at 09:36
  • can you check the sources for that? I am looking here for example https://svn.apache.org/repos/asf/struts/struts1/tags/STRUTS_1_2_1/src/share/org/apache/struts/upload/DiskFile.java and this is not the case – Eugene Nov 17 '17 at 09:37
  • @Eugene I have checked the sources before posting this question - and that's why I am even more confused. All my files are actually class files (bytecode). And anything above 256KB is having this switchover. – ha9u63a7 Nov 17 '17 at 09:42
  • 1
    ok, let's suppose this is true - why can't you do an `instanceOf` check then before calling `reset`? I mean they return an `InputStream` - meaning they can change the return type *at any time and any version*... – Eugene Nov 17 '17 at 09:44
  • @Eugene point noted - but I want to understand why this switchover isn't documented as "Platform/OS dependent" issue? What if I go to Linux or other OS where this will not happen? Before I go and hit the code, I would have liked to get a documentation for this, but looks like it isn't there so far. – ha9u63a7 Nov 17 '17 at 09:47
  • I do agree this is confusing, but I don't have a solution for it. You *could* go to their JIRA and specifically ask them to include in the docs; may be this just "escaped" – Eugene Nov 17 '17 at 10:04
  • btw that's very interesting but from the 2 version of `DiskFile` that I found `1.1.0` and `1.2.9` they *both* return JUST `FileInputStream`. what version are you using? – Eugene Nov 17 '17 at 10:24
  • @Eugene it's 1.1.0 and I am using it as a dependency `jar` - unfortunately cannot move to 2.0 or higher - it's a legacy system-related restriction – ha9u63a7 Nov 17 '17 at 10:33
  • then this makes no sense... http://grepcode.com/file/repository.springsource.com/org.apache.struts/com.springsource.org.apache.struts/1.1.0/org/apache/struts/upload/DiskFile.java#DiskFile.getInputStream%28%29 – Eugene Nov 17 '17 at 10:34
  • 1
    @Eugene: perhaps, the `FormFile` is actually a [`CommonsFormFile`](http://grepcode.com/file/repository.springsource.com/org.apache.struts/com.springsource.org.apache.struts/1.1.0/org/apache/struts/upload/CommonsMultipartRequestHandler.java#611) which delegates to [`DiskFileItem.getInputStream()`](http://grepcode.com/file/repository.springsource.com/org.apache.commons/com.springsource.org.apache.commons.fileupload/1.2.1/org/apache/commons/fileupload/disk/DiskFileItem.java#227)… – Holger Nov 17 '17 at 13:16

1 Answers1

2

I’m not familiar with the Struts API, but to me, it seems that when a return type is InputStream rather than a specific subclass, then you have no guarantees regarding the actual type of the returned stream.

Since calling reset() is only valid when having a preceding mark(readlimit) call, treating an unspecified InputStream type generically is straight-forward:

InputStream inputStream = …
int readlimit = …

if(!inputStream.markSupported()) {
    inputStream = new BufferedInputStream(inputStream, readlimit);
}

inputStream.mark(readlimit);
// read some date, not more than readlimit
inputStream.reset();
Holger
  • 285,553
  • 42
  • 434
  • 765
  • I am not sure BufferedInputStream will always work. I have seen instances where IOException were thrown because reset happend to an Invalid Mark. it would be safer to just have ByteArrayInputStream if the content bytes are available. – ha9u63a7 Nov 17 '17 at 13:15
  • 2
    Then it’s your fault. You *must not* read more than the specified `readLimit` number of bytes, otherwise the mark is invalid and `reset()` not supported anymore. You may get away with erroneous behavior with `ByteArrayInputStream` which does not check the `readLimit`, but reading more bytes than specified before calling `reset()` is still an invalid usage. Of course, you could read everything into a `byte[]` array first, but why bother with `mark()` and `reset()` then? You don’t need to deal with an `InputStream`, if your code works with an existing array only. Just use the array. – Holger Nov 17 '17 at 13:19
  • It's still incorrect because reset works for ByteArrayInputStream() at least for all the cases I have tried so far - and what is this "readLimit" and how do you propose we set that every time? Is this the total number of bytes are you referring to or something else? – ha9u63a7 Nov 17 '17 at 13:23
  • 1
    @ha9u63ar: it’s backed by [the specification](https://docs.oracle.com/javase/8/docs/api/java/io/InputStream.html#reset--): “If the method mark has not been called since the stream was created, or the number of bytes read from the stream since mark was last called is larger than the argument to mark at that last call, then an IOException **might** be thrown.” Throwing is not mandatory. Though I would appreciate if `ByteArrayInputStream` would throw as well, just to help avoiding code that only works with `ByteArrayInputStream`… – Holger Nov 17 '17 at 13:26
  • since I won't know where to set the readLimit - I believe it's better to use ByteArrayInputStream for now. From what I see on the web, this is recommended if I have no control over minimum/maximum bytes to read before mark is invalid. If you look at the underlying implementation of ByteArrayInputStream and BufferedInputStream - the mark() and reset() implementations are different - so I don't see how they might be consistent. – ha9u63a7 Nov 17 '17 at 13:39
  • 1
    Seems like an xy-problem. Why do you need mark/reset at all? When you want to enforce the use of a `ByteArrayInputStream`, you have to read all bytes into an array first, which implies knowing the size. However, once you have that array, you have already done everything you can do with an `InputStream`, all that `mark`/`reset` offers, is to do the same thing again, to read the data again into an array, which makes no sense if you already have an array containing all data. Even if you really need copies, you can copy the array… – Holger Nov 17 '17 at 13:50
  • 1
    And when all you want to do, is to reset the stream to the beginning, you could just call `file.getInputStream()` again, and you have an input stream that will read again from position zero… – Holger Nov 17 '17 at 13:54