0

We are trying to find an in-memory database with index support that we can use for our application. We are looking at Aerospike, Apache Ignite, Geode, Voltdb. There is not much to distinguish and every one claims to be fast and have great community support.

Out of these, Aerospike and VoltDB are C/C++ based and Apache Ignite and Geode are java based.

Considering there is little to choose between the databases in terms of performance and further it is tough to test which db will work for us better in production, Was trying to find out if the performance of an in-memory database will also depend on whether it is java based or c/c++ based. Considering garbage collection issues are quite frequent and its a tough to properly tune it for your use case(which may change after some time), is it true that the java based dbs will be at a disadvantage.

Thanks

Tuco
  • 712
  • 2
  • 8
  • 20

3 Answers3

5

You can't really conclude that one db is faster than another just because it is written in X language vs Y language. Database is a very complex product with many features. Some queries may be faster in one db, other queries in another db.

The only way to find out is to test your specific use case.

Pavel Tupitsyn
  • 8,393
  • 3
  • 22
  • 44
  • 1
    I'm pretty sure that language is not matter, but database name can tell us how hot (fast) is it :) – Randoom Apr 19 '17 at 11:51
  • thanks, but testing what will happen in production with millions of items with all the databases is tough. – Tuco Apr 19 '17 at 11:52
  • 3
    Yes. Such is life. – Pavel Tupitsyn Apr 19 '17 at 11:53
  • 2
    I would tend to agree with this answer. It is also true that Java's garbage collection can have an impact. Testing with a few million records on a small (2-3 nodes) cluster shouldn't be that painful. Aerospike provides simple benchmark tools. Pavel is right that results will depend on the workload. For small records (few kilobytes) with read (simple key value look ups) / write mixed workloads I am certain Aerospike will not disappoint you! (I do work at Aerospike). – Meher Apr 19 '17 at 18:46
  • 1
    You're correct, but in certain situations such as garbage collection a database written in Java will definitely show the language it's based on, even if you use a super-duper non-standard JVM. – Ronen Botzer Apr 21 '17 at 10:56
  • @RonenBotzer Some Java databases, like Ignite, store data in unmanaged (offheap) memory to avoid GC costs. – Pavel Tupitsyn Apr 21 '17 at 10:59
1

For an in-memory DB that maintains consistency like Geode does (i.e. makes synchronous replication to other nodes before releasing the client thread), your network is going to be a bigger concern than will the hotspot compiler. Still, here are two points of input to get you to the point where language is irrelevant:

1) If you are doing lots of creates/ updates over reads: Use off-heap memory on the server. This minimizes GC's.

2) Use Geode's serialization mapping between C/C++ and Java objects to avoid JNI. Specifically, use the DataSerializer http://gemfire.docs.pivotal.io/geode/developing/data_serialization/gemfire_data_serialization.html If you plan to use queries extensively rather than gets/ puts, use the PDXSerializer: http://gemfire.docs.pivotal.io/geode/developing/data_serialization/use_pdx_serializer.html

Wes Williams
  • 266
  • 1
  • 5
1

I guess I'm going to be the contrarian.

All else being equal, compiled code is faster than the JVM, and there's just no garbage collection to have to employ tactics to avoid.

Having been written in C/C++, eXtremeDB (my company's product) is able to avoid using the C run-time memory management altogether. Managing the memory area entirely within the database software enables the use of highly efficient & purpose-specific memory managers, and eliminates the potential for memory leaks (from the whole system point of view, e.g. if 200GB is set aside for the in-memory database, it will never exceed 200GB). eXtremeDB is not unique in this regard; other in-memory DBMS written in C/C++ are also able to avoid the C run-time malloc/free or C++ new/delete. So please don't ding me for making a product pitch, I'm not. I'm pointing out a capability that is possible with a C/C++ implementation that may not be available with a JVM.

The other answerers are correct: that a crappy implementation of a SQL execution plan for a given query can overwhelm any advantage of compiled code vs JVM, but at some point you've got to have confidence that your DBMS vendor knows what they are doing (and are interested in improving their product if a plan is demonstrably inefficient/wrong). If you're not using SQL, then the goodness/badness of a SQL optimizer is not part of the equation, and it's really down to how well the database system's index methods are written, and availability of different types of indexes for different search requirements (e.g. a hash index will generally be better than a b-tree for exact match lookup, but a hash index can't support partial key (wildcard) search or ordered retrieval).

There are some public (independent, audited) benchmarks you can look to. We have participated in a few STAC-M3, though only one other DBMS has also (the DBMS you listed specifically, have not).

Steven Graves
  • 837
  • 6
  • 9
  • 3
    Java-based products (such as Apache Ignite) also use unmanaged memory to avoid GC overhead. "compiled code is faster than the JVM" - JVM also runs compiled code. Yes, C++ allows writing faster code than Java in some cases. But this does not mean that all C++ code is necessary faster. Very interesting blog series on how simple C# program is faster than simple C++ program: https://blogs.msdn.microsoft.com/jonathanh/2005/05/20/optimizing-managed-c-vs-native-c-code/ – Pavel Tupitsyn Apr 20 '17 at 09:58