0

In one of the Spring Boot applications I'm working on there's a REST API with a POST API endpoint which allows a user to upload a file by adding its base64 encoding to the request body as follows :

{
    "file": <base64 String>
}

This JSON gets mapped to a corresponding DTO object which then gets processed.

Now let's say I send in a file of about 1Mb in size. By the time the message is received by the REST controller, the heap size increases by up to 6x as much memory as the size of the file. In this case it's about 6Mb increase in heap size. I measured this with jconsole / VisualVM. For even larger files of 30Mb I see an increase in heap size of about 90Mb.

So I added a breakpoint to the controller method to check what is consuming memory but it's not easy to navigate through that data, and all I found was the DTO and the JSON String representing the file.

What else could be consuming the memory, and is there a way to minimize it? Possibly changing the API itself?


EDIT : Possible duplicate of parsing to json using jackson-mapper causes

Bruno
  • 63
  • 7
  • 1
    Don't send files like that, or at least don't bind to a parameter as that will read the string in memory, at least 2 or 3 times (once base64 encoded, byte[] for the decoded stuff, a string based on that byte[]). Instead, I would suggest using regular file upload and use a streaming approach for reading it. – M. Deinum Mar 01 '22 at 15:16
  • Thanks for the comment and yes, I agree, it's just that the API is the way it is due to earlier design choices and changing it now would break client applications. – Bruno Mar 02 '22 at 10:09
  • 1
    Another option could be to not use deserializing to an object, but rather use the Jackson streaming API to parse the request, this would reduce the memory usage with the drawback of more complexity in the controller (and maybe backend services needing to handle a stream instead of the full file etc.). – M. Deinum Mar 02 '22 at 12:27
  • I put this on hold, now took up the problem again and it seems like it's more complicated to solve than anticipated, I need to read in base64 encoded data and send it through as-is to an other REST service. Looks like turning off the base64 decoding isn't a trivial thing to do and setting up a channel from input - to outputstream (which avoids the need for storing the entire JSON, ever) looks even more complicated. – Bruno Apr 04 '22 at 09:01
  • If you only need to send it to another api as-is why is it more complicated then reading hte input and directly sending it to the output? Unless you need to undo the base64 encoding which would require reading everything before you can write. But that isn't clear from your comment. – M. Deinum Apr 05 '22 at 06:10
  • The input is a JSON where one of the fields has base64 encoded data (representing PDF files), the output too, but the JSONs aren't exactly the same. If I read the input by using Jackson's ObjectMapper, it automatically decodes the base64 field which causes an increase in memory usage since I need to re-encode the base64 for the outgoing JSON. – Bruno Apr 06 '22 at 08:15
  • Why? Why even read it and re-encode it? If you just need to send it somewhere else, send it as is... I don't see the problem with that. – M. Deinum Apr 06 '22 at 08:17
  • I'm not sure what you mean, JSON for my API and that of the API I send it to is different so I can't send the entire structure as-is, only the base64 data enclosed within the incoming JSON. If I map it to a DTO and the field corresponding to the base64 field is of type `String` or `byte[]` then Jackson automatically decodes the base64. – Bruno Apr 06 '22 at 09:25

0 Answers0