2

i am trying to figure out if there is a 'simple' way to store persistently a large object instance in the JVM memory to be shared and re-used for multiple runs by other programs. I am working on netbeans using java 8. The data is some ~500 MB of serialized objects. They fit easily in the RAM but take few minutes to de-serialize from disk each time.

Currently the program load a serialized object from the local disk into memory for each run. As the data is only read from during the test, it would be optimal to hold it in memory and access it directly at each run.

We've looked into RMI but the overhead, the marshalling process and the transmission will kill the performance. I was wondering if there is a more direct way to access data from a program running on the same JVM, like sharing memory.

The multiple runs are to test different processing / parameters on the same input data.

I am open to suggestion on the best practice to achieve this 'pre-loading', any hints would be very appreciated.

Thanks

arco2ch
  • 23
  • 4
  • " it would be optimal to hold it in memory and access it directly at each run ", this is what an application server does – AntJavaDev Sep 08 '15 at 08:10
  • By test, I assume we're talking about Unit Tests? – Nick Holt Sep 08 '15 at 08:29
  • By test i meant a runnable program which first loads the data then operates on it with different parameters and produces output reports. Not Unit Tests, sorry for the confusion! – arco2ch Sep 08 '15 at 08:36

2 Answers2

1

Java serialization is never going to play well as a persistence mechanism - changes to the classes can easily be incompatible with the previously stored objects meaning they can no longer be de-serialized (and in general all object models evolve one way or another).

While suggestions are is really off-topic on SO, I would advise looking at using a distributed cache such as Hazelcast or Coherence.

While you'll still have to load the objects, both Hazelcast or Coherence provide a scalable way to store objects that can be accessed from other JVMs and provide various ways to handle long-term persistence and evolving classes.

However, neither works well with big object graphs, so you should look at breaking the model apart into key/value pairs.

An example might be an order system where the key might be a composite like this:

public class OrderItemKey
{
  private OrderKey orderKey;
  private int itemIdex;

  ...
} 

And the value like this:

public class OrderItem
{
  private ProductKey productKey;
  private int quantity;

  ...
}

Where OrderItems could be in one cache, while Products would be in another.

Once you've got a model that plays well with a distributed cache you need to look at co-locating related objects (so they're stored in the same JVM) and replicating reference objects.

When you're happy with the model, look at moving processing into the cache nodes where the objects reside rather than pulling them out to perform operation on them. This reduces the network load giving considerable performance gains.

Nick Holt
  • 33,455
  • 4
  • 52
  • 58
  • This is the proper way to do it although the solution with RamDisk worked out very well and is really straightforward. Thanks for pointing the "right" way of doing it. (I cannot up-vote as i don't have enough reputation) – arco2ch Sep 08 '15 at 14:08
  • @arco2ch don't under-estimate the problems you'll run into using Java Serialization as a persistence mechanism; `serialVersionUID`s can only do so much an as soon a field is removed or changes its type you'll be buggered – Nick Holt Sep 08 '15 at 16:57
0

If I understood well you need to read a huge amount of data from disk and use this data only for test purpose.

So every time you run the tests you need to reload them and it slow down your tests.

If this is the situation you can also try to create a disk on memory (ram disk). So your file is saved on a disk with the performances of the ram.

Here is a link for the command ramfs to create it on linux systems

Davide Lorenzo MARINO
  • 26,420
  • 4
  • 39
  • 56
  • Correct, is there a way to achieve something similar on windows? – arco2ch Sep 08 '15 at 08:30
  • Yes i used a ram disk many years ago Also in Windows. I don't remember the Name sorry – Davide Lorenzo MARINO Sep 08 '15 at 10:55
  • I tried with [ImDisk](http://http://www.ltr-data.se/opencode.html/#ImDisk) and the deserialization time decreased 80%. This may not be the nicest solution but it sure helps for developing purposes! Thanks Davide – arco2ch Sep 08 '15 at 14:06