4

BASICS

This is a Java 1.8 Spring Boot 1.5 Application.

It currently uses Apache Tika 1.22 to read Mime-Type information, but this can easily be changed.

SUMMARY

There is a mapper which User uses to download files. These files come from another URL separate from the application. The file may be a variety of types (excel, PDF, text, etc), and the application has no way of knowing what it will be until it pulls the file down.

ISSUE

In order to return the file download to User with the appropriate title, extension, and ContentType, the application uses Apache Tika to pull that information. Unfortunately, now that the header of the InputStream is consumed, when the application writes the InputStream to the HttpServletResponse, the file is incomplete.

This means that, in order to function currently, the application closes the first InputStream and then opens a second InputStream to return to User.

That's not good, because it means that the URL is being called twice, wasting system resources.

What is the proper way to have this function?

CODE EXAMPLE

    @GetMapping("/My/Download/")
    public void doDownload(HttpServletResponse httpServletResponse) {

            String externalFileURL = "http://www.pdf995.com/samples/pdf.pdf";

            try {       
                InputStream firstStream = new URL(externalFileURL).openStream();        
                TikaConfig tikaConfig = new TikaConfig();
                MediaType mediaType = tikaConfig.getDetector().detect(TikaInputStream.get(firstStream), new Metadata());
                firstStream.close();

                InputStream secondStream = new URL(externalFileURL).openStream();   
                httpServletResponse.setHeader("Content-Disposition", String.format("attachment; filename=\"%s\"", "DownloadMe." + mediaType.getSubtype()));
                httpServletResponse.setContentType(mediaType.getBaseType().toString());
                FileCopyUtils.copy(secondStream, httpServletResponse.getOutputStream());
                httpServletResponse.flushBuffer();
            } catch (Exception e) {

            }
    }
Miss Kitty
  • 162
  • 1
  • 3
  • 16

1 Answers1

4

Javadoc of detect() says:

The given stream is guaranteed to support the mark feature and the detector is expected to mark the stream before reading any bytes from it, and to reset the stream before returning.

Javadoc of TikaInputStream says:

The created TikaInputStream instance keeps track of the original resource used to create it, while behaving otherwise just like a normal, buffered InputStream. A TikaInputStream instance is also guaranteed to support the mark(int) feature.

Which means you should use TikaInputStream to read the content, and try-with-resources to close it:

try (InputStream tikaStream = TikaInputStream.get(new URL(externalFileURL))) {
    TikaConfig tikaConfig = new TikaConfig();
    MediaType mediaType = tikaConfig.getDetector().detect(tikaStream, new Metadata());

    httpServletResponse.setHeader("Content-Disposition", String.format("attachment; filename=\"%s\"", "DownloadMe." + mediaType.getSubtype()));
    httpServletResponse.setContentType(mediaType.getBaseType().toString());
    FileCopyUtils.copy(tikaStream, httpServletResponse.getOutputStream());
    httpServletResponse.flushBuffer();
}
Andreas
  • 154,647
  • 11
  • 152
  • 247
  • Thank you for your time and effort! However, this does not function. The error "java.io.IOException: mark/reset not supported" is returned when using the example above, unfortunately. I'd appreciate your help if you have any other thoughts! – Miss Kitty Sep 22 '19 at 02:00
  • Ah, I figured it out. Your answer works, HOWEVER, one must add another line making it a BUFFERED Input Stream. That is, InputStream bufferedIn = new BufferedInputStream(tikaStream); Then you use bufferedIn instead. Thanks! (Note: I marked this as correct, because it mostly is, but for the sake of future people out there, you might want to edit it to add the BufferedInputStream bit!) – Miss Kitty Sep 22 '19 at 02:04
  • 1
    So `TikaInputStream` is violating its own contract. You should report that to them. Try `InputStream tikaStream = new BufferedInputStream(TikaInputStream.get(new URL(externalFileURL)))` – Andreas Sep 22 '19 at 02:05