In memory storage engine is not faster than wired tiger

Question

I'm running a query that returns a lot of data. It looks up 916 documents, each of them having a large data field (around 5MB). The query looks like this:

db.collection.find(
{'name': somename, 'currency': mycurrency, 
'valuation_date': {'$in': [list_of_250_datetime_datetime]}
}.{'data_column: is set to true or false in the below test results}).limit(x)

I have been trying to optimize the query and found out that most of the time is spent loading (or transmitting) that large data item, rather than look it up in the 5GB database. So I assume the query is propery optimized and the indices are used correctly, which is also confirmed by the profiler.

So I assumed that loading the data from disk would take most of the time, but it seems that when I use the in memory storage engine, things are actually slowed down. How is this possible? And what else can I do to speed things up?

In Memory storage engine:

================ Starting test using mongodb://localhost:27018/ ================
Looking up 100 values excluding data column...
++++++++++ Query completed in 0.0130000114441 seconds ++++++++++ 
Looking up 100 values, full json with data...
++++++++++ Query completed in 2.85100007057 seconds ++++++++++ 
Looking up all values, excluding data column...
++++++++++ Query completed in 0.0999999046326 seconds for 916 items ++++++++++ 
Looking up all values, full json with data...
++++++++++ Query completed in 29.2250001431 seconds for 916 items ++++++++++

Wired tiger:

================ Starting test using mongodb://localhost:27017/ ================
Looking up 100 values excluding mdo column...
++++++++++ Query completed in 0.0120000839233 seconds ++++++++++ 
Looking up 100 values, full json with data...
++++++++++ Query completed in 2.97799992561 seconds ++++++++++ 
Looking up all values, excluding data column...
++++++++++ Query completed in 0.0700001716614 seconds for 916 items ++++++++++ 
Looking up all values, full json with data...
++++++++++ Query completed in 23.8389999866 seconds for 916 items ++++++++++

score 0 · Answer 1 · answered Jul 05 '18 at 13:53

0

It's not faster because MongoDB with WT is caching the data in memory anyway, you have enough RAM for the data you are querying to fit in the cache so there is no penalty (except for first access, of course) reading from disk. If you started with a cold cache you would see a significant difference for WT versus in-memory, but not once the data has been accessed and loaded into the cache.

My immediate suspicion would be the network and if this is 5MiB each over 916 documents that would mean you are getting (916 * 5 * 8)/23.84 = 1.537Gb/s or for the first example (916 * 5 * 8)/29.23 = 1.254Gb/s - the 100 value versions are in similar ranges. You didn't mention whether this is Windows or Linux, or anything else about the environment other than that this is using localhost, so hard to comment as to what might speed things up, but I suspect it is the transfer of the data that is your bottleneck at the moment.

answered Jul 05 '18 at 13:53

Adam Comerford

21,336
4
65
85

It's a Windows 7 machine. Since I'm using localhost, how can it be the network? – Nickpick Jul 05 '18 at 14:31
localhost is not infinite in terms of speed, it just is not hitting your local network. I am not an expert on how Windows handles such things, but a bit of googling suggests that slow loopback performance can be a problem. One suggestion is to try 127.0.0.1 rather than localhost, but I can't see that making a huge difference here. Otherwise, it may be that the MongoDB client (or server) needs some optimisation to improve loopback performance on Windows. If you try benchmarking your loopback bandwidth (FTP transfer maybe) you can at least find out what your max speed is independent of MongoD – Adam Comerford Jul 06 '18 at 15:18
Is it a possibility that the conversion from bson to json takes a lot of time? – Nickpick Jul 09 '18 at 16:44
Possible, but I wouldn't serializing the data at the top of the list. If you benchmark with something with minimal overhead (FTP or similar) then you can get a reasonable idea of what/where the bottleneck might be. – Adam Comerford Jul 09 '18 at 19:57

score 0 · Answer 2 · answered Jul 05 '18 at 13:55

As per my knowledge, WiredTiger persists data in disk for index, user_data, replica etc. while In-memory storage engine is good with the less disk-io operation as it doesn't persist any data on disk. Also read this

In-memory storage engine requires that all its data (including oplog if mongod is part of replica set, etc.) fit into the specified --inMemorySizeGB command-line option or storage.inMemory.engineConfig.inMemorySizeGB setting. See Memory Use.

So, If the data size you are querying upon is too large it will eventually take longer than WiredTiger.

My ram is 64GB. The database is only 20GB, so it would fit. At least there was no complaint. — Nickpick, Jul 05 '18 at 14:31

In memory storage engine is not faster than wired tiger

2 Answers2