0

I am writing an application which is storing a lot of objects into MongoDB via Spring Data's MongoTemplate. In order to avoid synchronization, I am creating a separate MongoTemplate for each thread. In fact, each thread get's it's own just about everything to avoid synchronization.

The application processes events generated while users interact with a web page. So, events from a specific user needs to be processed sequentially while events across multiple users can be processed in parallel. The application currently consists of N pipelines and a load balancer which distributes the events to a pipeline based on a hash/mod of userId. At the end of the pipeline, data is written to MongoDB using Spring Data MongoDB Template.

This single process is able to process 2500 events/second. However, I am observing significant (at the rate of 2500 events/second any blocking becomes rather significant) contention between threads. All in the area of ClassTypeInformation accessing the synchronized CACHE.

Unfortunately, MongoTemplate uses ClassTypeInformation which stores a cache in a synchronized map. So, no matter how I try, writing data to MongoDB always hits this contention between my worker threads.

I think ClassTypeInformation should be converted into a bean so that one can be provided if user so desires. Allowing for this, would remove the contention between multiple threads.

Does anyone know why this was implemented as a static as opposed to a normal Spring bean? Are there any plans to make this change?

Alex Paransky
  • 985
  • 2
  • 10
  • 21

1 Answers1

0

No, there are no plans. For multiple reasons: ClassTypeInformation is a value object, not an injectable. Instances have to be created on-the-fly very frequently. It simply doesn't make sense to configure an instance of it with Spring, as instances have to be created usually when we encounter a Class of some sort, need to inspect it and maintain and resolve generics information. Using a static factory ensures we can apply improvements to the object creation without having to modify all the clients.

We've applied a lot of benchmarking and tweaks to remove some performance bottle necks in the most recent releases and ClassTypeInformation has never even come close to show up in profiler analysis.

Generally speaking I'd start way before digging into low-level internals:

  • What's the reason you think you need to start parallelizing insertions in the first place. MongoTemplate.insertAll(…) is basically the fastest way you can insert objects into MongoDB if you want to leverage our object-to-document mapping facilities. Introducing parallelism can in fact slow things down as the need to synchronize arises in the first place.
  • Why do you think creating separate MongoTemplate instances is improving things? MongoTemplate is thread-safe an thus can be shared between threads.
  • Have you considered to plug custom converters that manually transform objects into a DBObject? By default we have to use reflection to transform the former into the latter and that of course comes a cost.
  • Do you actually need to start with objects in the first place? Sometimes data read from an input source can be mapped into DBObjects directly which allows you to bypass the object-to-document mapping entirely.

Generally speaking, using all of the convenience of a data access API usually opposes the goal to get most of the performance. Instead of sticking to the convenience and trying to parallelize everything, it's often easier to get rid of some of the convenience to avoid the costly code paths.

If you really find any issues with a default setup, we're happy to take a bug report with some evidence of your findings (an executable test case or the like) to see what we can improve. But I think it's helpful to start simple and not try to be too clever in the first place.

Oliver Drotbohm
  • 80,157
  • 18
  • 225
  • 211
  • Oliver, I updated the main question with the reason why there are so many parallel writers. In this particular case, I think going with Custom converters works out better. My model is not so big that this is quite workable. However, if ClassTypeInformation was a pluggable bean, it may have been easier for me to avoid contention in this specific case (at cost of more memory usage) by making it a Prototype. – Alex Paransky May 24 '15 at 20:29