0

Problem

I have an .xlsx file with lots of formulas. I want to transform this file to a new .xlsx file with all formulas replaced with its absolute value. Why? probably not related with this question.

What did I do?

My target was to perform this task in as low amount of heap memory as possible. So I used a combination of apache POI XSSF event API to read the source file and SXSSF API to write the output file. It worked well.

Observations

All measurements are taken using JProfiler 10

When I run my code to convert a file of around 25K+ rows (with around 25K * 23 formulas), it used around 250 MB of heap space at the peak. Now I ran the same command with -Xmx24M and the code is managed to run within this memory limit which is significantly lower than the first run.

Questions

  • If my code can already manage to run within this low memory limit why it took 250MB+ of ram in the first run?
  • Is it possible to limit this particular piece of code to limit its memory consumption within a low limit even though the Xmx is not set?
Community
  • 1
  • 1
Samiron
  • 5,169
  • 2
  • 28
  • 55

1 Answers1

1

The garbage collector did not get around to immediately freeing memory in your first run. Processing XLSX files will end up generating a lot of transient objects, so the first run lets them build up to around 250MB before cleaning up; while the second, memory-constrained run, is forced to clean up the objects sooner.

The garbage collector has many, many options and strategies that can be configured. Off the top of my head, the only way I know to limit consumption for only that specific code is to run that code only in its own JVM process, with appropriate GC parameters.

wmorrell
  • 4,988
  • 4
  • 27
  • 37
  • Do I actually gain anything by setting the -Xmx24M but letting the gc triggered more frequently when memory is a concern? Is it OK to claim that limiting the max heap to 24M is a better solution in my case? What do you think? – Samiron Jun 27 '17 at 06:39
  • I would argue that you are making your application perform worse by constraining the memory like that. Garbage collection does come at a cost, and the default collection algorithm will suspend all code execution until the collection process is finished. I encourage you to try timing both configurations over several executions, for example with a shell script executing the program 10 times, run with `time`; I expect the code running under -Xms24M to be slower due to the additional GC pauses. – wmorrell Jun 28 '17 at 00:31