Performance issue while using microstream

Question

I just started learning microstream. After going through the examples published to microstream github repository, I wanted to test its performance with an application that deals with more data.

Application source code is available here.

Instructions to run the application and the problems I faced are available here

To summarize, below are my observations

While loading a file with 2.8+ million records, processing takes 5 minutes
While calculating statistics based on loaded data, application fails with an OutOfMemoryError

Why is microstream trying to load all data (4 GB) into memory? Am I doing something wrong?

score 1 · Answer 1 · answered Apr 25 '22 at 08:36

MicroStream is not like a traditional database and starts from the concept that all data are in memory. And an Object graph can be stored to disk (or other media) when you store this through the StorageManager.

In your case, all data are in 1 list and thus when accessing this list it reads all records from the disk. The Lazy reference isn't useful how you have used it since it just handles the access to the one list with all data.

Some optimizations that you can introduce.

Split the data based on vendorId, or day using a Map<String, Lazy<List>>
When a Map value is 'processed' removed it from the memory again by clearing the lazy reference. https://docs.microstream.one/manual/5.0/storage/loading-data/lazy-loading/clearing-lazy-references.html
Increase the number of Channels to optimize the reading and writing the data. see https://docs.microstream.one/manual/5.0/storage/configuration/using-channels.html
Don't store the object graph every 10000 lines but just at the end of the loading.

Hope this helps you solve the issues you have at the moment

Performance issue while using microstream

1 Answers1