0

I am currently implementing a file upload service in java using jetty as my servelet container. I am facing an issue I want to get fixed.
I have an endpoint for the file upload which is a post endpoint which takes multipart form data.
This works fine for small files but gives me a headache when a user uploads a big file. If I am not wrong then Jetty Buffers the uploaded file before forwarding it to my FileInputStream.
By this I mean that it fills its internal buffer first and then writes it to the FileInputStream. Is there a way to stop this and tweak jetty that it directly relays the data to the FileInputStream before buffering it?
I already tried to wrap the input stream into a buffered one but still, Jetty first consumes the file, buffers it and then writes it to the InputStream.
After some research, I saw a comment suggesting to use put instead and then access the raw data to achieve this direct forwarding. But I was wondering if there is another maybe even better way.

Regards Artur

Ichigo Kurosaki
  • 3,765
  • 8
  • 41
  • 56
Artur K.
  • 599
  • 1
  • 6
  • 11

1 Answers1

0

When working with large files on multipart/form-data make sure you are using the most recent RFC7578 as it's the most performant.

Server server = new Server();

HttpConfiguration httpConfig = new HttpConfiguration();
httpConfig.setMultiPartFormDataCompliance(MultiPartFormDataCompliance.RFC7578);
ServerConnector connector = new ServerConnector(server, 
    new HttpConnectionFactory(httpConfig));
server.addConnector(connector)

Next, when you are using the HttpServletRequest.getPart(String) method, you should know that Jetty internally has to extract the contents of the multipart/form-data to be accessible by both the .getPart() and .getParameter() methods.

Accessing large files this way is actually a unit test in Jetty itself.

See: HugeResourceTest.java

Joakim Erdfelt
  • 46,896
  • 7
  • 86
  • 136
  • This is handy, but still doesn't answer the OP's question of not caching the entire body of the request before being able to access any of the "part". Is there a mechanism where each "part" is available as soon as it's received and can therefore be "consumed" by the handler immediately instead of waiting for all the parts to arrive before `getParts()` returns. – notthetup Jun 16 '23 at 01:01
  • 1
    The APIs for `getParts()` and `getPart(String name)` are dependent on reading the entire request body content before returning from those API calls. You can configure it to cache to disk or memory, but it still needs to read it as even a simple `.getPart("filename")` can fail if that segment name is provided more than once, or not at all. – Joakim Erdfelt Jun 26 '23 at 20:36
  • Gotcha! Makes sense. My personal use case is to upload a large file (~500MB) to a jetty instance running on a small server with limited RAM. I guess I could enable caching the parts to disk to help with memory pressure? How would one enable that? – notthetup Jun 28 '23 at 06:13
  • 1
    @notthetup use either the `@MultipartConfig` annotation or the `` web descriptor (`WEB-INF/web.xml`), they have the configurations you are looking for. – Joakim Erdfelt Jun 28 '23 at 12:02