Outgrew MongoDB … now what?

Question

We dump debug and transaction logs into MongoDB.

We really like MongoDB because:

Blazing insert perf
document oriented
Ability to let the engine drop inserts when needed for performance

But there is this big problem with MongoDB: The index must fit in physical RAM. In practice, this limits us to 80-150 GB of raw data (we currently run on a system with 16 GB RAM).

So, for us to have 500 GB or a tb of data, we would need 50 gb or 80gb of RAM.

Yes, I know this is possible. We can add servers and use MongoDB sharding. We can buy a special server box that can take 100 or 200 GB of RAM, but this is the tail wagging the dog! We could spend boucoup $$$ on hardware to run FOSS, when SQL Server Express can handle WAY more data on WAY less hardware than Mongo (SQL Server does not meet our architectural desires, or we would use it!) We are not going to spend huge $ on hardware here, because it is necessary only because of the Mongo architecture, not because of the inherent processing/storage needs. (And sharding? Cost aside, who needs the ongoing complexity of three, five, or more servers to manage a relatively small load?)

Bottom line: MongoDB is FOSS, but we gotta spend $$$$$$$ on hardware to run it? We should rather buy commercial software!

I'm sure we are not the first to hit this issue, so we ask the community:

Where do we go next?

(We already run Mongo v2)

Reduce the number of indexes. Or in other words: You database design is odd. — mailq, Nov 27 '11 at 21:28
Gratulations. You selected a open source database not knowing what you really do and now it comes back and hits you. Real life. Deal with it - either replace the database with something commercial likely, or put in hardware or rework to sharding with less indices. — TomTom, Nov 27 '11 at 21:31
Hey, don't comment on what you don't know. We have worked with mongodb for a long time, we know it well, and we really like it. Our index is really small, and has been reworked a few times. The issue is simple: We just outgrew mongodb, as we suspect others have, and we want the community's input as to where to go next. — Jonesome Reinstate Monica, Nov 27 '11 at 21:42
As I store way more data on a similar server in a MySQL database leads to: "Wrong database design". — mailq, Nov 27 '11 at 21:48
You didn't outgrow MongoDB, you didn't do your research. There's a huge difference. — ceejayoz, Nov 28 '11 at 02:21
@cceejayoz: Can you please take it at face value? We did our research, and happily ran MongoDB for a year. We are now faced with new scaling needs, and don't like the Mongo answers... This is common in the Mongo world. I am just asking where folks have gone who have hit this issue. — Jonesome Reinstate Monica, Nov 28 '11 at 06:54
The fact you're toting 500GB-1TB as 'outgrowing' MongoDB is; no offense; an indicator to the ignorance when the system was created in the first place. — thinice, Nov 29 '11 at 03:45
You may not like the answers to your "scaling" problem because you don't actually have a scaling problem; you have a design and implementation problem. You are not indexing efficiently. — gWaldo, Nov 29 '11 at 03:57
But seriously, if you feel that you absolutely must keep indexes of that size, you're going to have the same problem of keeping abominably huge indexes in RAM in any database product you seek out. You would have to buy a high-capacity server (DL380 G7 can make that, and it's a mid-range commodity server; nothing exotic) to store those indexes. — gWaldo, Nov 29 '11 at 04:03
I personally find the "Bottom line: MongoDB is FOSS, but we gotta spend $$$$$$$ on hardware to run it? We sould rather buy commercial SW!" to scream of ignorance and arrogance. — gWaldo, Nov 29 '11 at 04:10
Well, seriously - being the guy who designed the system to an unworkable state telling people you made a big mistake that their statement if ignorant and arrogant shows... you may want to change jobs. McDonalds always needs people to serve burgers. Really. This WAS your design mistake. — TomTom, Nov 29 '11 at 05:31
@tomtom does this mean we've finally found a database that your mobile phone can't run? — Rob Moir, Nov 29 '11 at 12:30
my mobile phone ca not run a lot of databases ;) But then, 500gb is small in MY world... ;) my current project has 21tb and my personal nontrivial database around 1tb. — TomTom, Nov 29 '11 at 12:46
@RobMoir Everyone knows that TomTom's mobile phone runs *web servers* not database servers. Stop trying to fit a square peg into a round hole. Pick the right tool for the job next time. — MDMarra, Nov 29 '11 at 13:48
Well MY phone runs a database damnit, so why can't everyone else's! — Rob Moir, Nov 29 '11 at 14:08
`"Bottom line: MongoDB is FOSS, but we gotta spend $$$$$$$ on hardware to run it? We sould rather buy commercial SW!`" You're right. All FOSS runs on magic fairy dust and none of it needs suitable hardware to run. I'd look at GoldenUnicornDB if I were you. It doesn't even need to run on a computer, it runs on hugs and laughter. — MDMarra, Nov 29 '11 at 14:53
I can't tell if he's trolling, being deliberately obtuse, or really think that downloading a linux distro entitles him to a free PC. Has vgv8 returned?! — gWaldo, Nov 29 '11 at 23:48
Despite everything, I learnt a lot from this question, the comments and the answers! — hplbsh, Nov 30 '11 at 16:39
@RobMoir The proper tool for a Database server is an iPad. You need the extra RAM. — voretaq7, Nov 30 '11 at 21:11
And here we see that MongoDB is web-scale. http://www.xtranormal.com/watch/6995033/mongo-db-is-web-scale — Avery Payne, Dec 02 '11 at 20:07

score 16 · Answer 1 · answered Nov 27 '11 at 22:05

16

If you are at a point where the current performance is too slow or the limits are reached then you have three options. And they are true for any problem.

Scale vertically: Meaning increase your machine power. More CPU or more RAM.
Scale horizontally: Meaning increase the amount of workers. More processes, more threads, more machines.
Change design: Do it differently. Other software, other algorithms, other storage system, other whatever.

As you excluded 1) and 2) from your options, there is only solution 3) left. So go for it...

answered Nov 27 '11 at 22:05

mailq

17,023
2
37
69

We know all that. If you read my post, you can see that I am looking for an alternative to mongodb, because we have rejected the mongodb hardware reqs, for both vertical and horizontal scaling. It is, basically a "Change Design" question, and we are asking what folks have done in this situation. thanks! – Jonesome Reinstate Monica Nov 27 '11 at 23:15
4

@samsmith Design questions are getting closed as "not constructive". Switch to [whatever platform](http://nosql-databases.org/) that meets your requirements. But it is on you to extensively test your setup before switching. I'm currently testing Cassandra as alternative (but this is *my* subjective decision). Your needs are different than mine! – mailq Nov 28 '11 at 00:58

Jonesome Reinstate Monica · Accepted Answer · 2019-09-13T19:53:27.940

We posted this same question on the Mongo forum, and the Mongo CTO responded, saying to review his presentation on how to optimize indexes

http://www.10gen.com/presentations/mongosf2011/schemascale

In this presentation, Mr. Horowitz states explicitly that sharding/horiz scaling can be overkill in many situations, and that design approaches (including some rather non-intuitive approaches that are kind of specific to Mongo) can make a given server scale much farther.

This presented some interesting concepts, including using client side logic to optimize how the db is used in a number of "non normalized" ways. There is a clear subtext to the presentation to the effect "if you just build by the book, you can easily hit unwanted problems related to scaling." For example, Mr. Horowitz (the CTO of 10Gen, maker of MongoDB) presents a "hybrid" design in which instead of one document per "record" you put perhaps 100 "records" in a document, resulting in a "bucket" kind of approach. This is done explicitly to reduce the index footprint. This is something that is coded on the client, and is not a "feature" of MongoDB. This hybrid approach may work for us, and could give us a 4x or 8x improvement in index size.

He also discusses "right balanced" btrees, which is basically optimizing the index design so that most queries access only the "right hand piece" of the index (as opposed to random access across the index, which, to perform well, requires that the whole index fit in RAM). This approach will not help us, as we need to query all over the index.

We are going to use these concepts as part of a review of our system.

(Interesting that of all the places I posted this question, the only person with a constructive response is the CTO of MongoDB itself.)

UPDATE (2017)

We found Mongodb, ultimately, to not be appropriate in a production env. Every couple of months, it dumps/trashes the entire db, and all data is lost. (It is not a primary data source, so we can live with the problem, though not happily.)

We have now completed a project to move to the elasticsearch stack, and are rolling that to production now. (We have used elasticsearch successfully for years.) Elasticsearch data is not as durable, as, say, Microsoft SQL Server, (we have had elasticsearch clusters fail with unrecoverable data loss), but elasticsearch is at least 10x (experientially, more than 100x) more reliable than Mongodb. (Elasticsearch, intelligently, makes no pretense of supporting Windows as a production platform, one of the big sins of Mongodb.)

We expect to have purged our entire env of Mongodb over the coming weeks.

Onward!

UPDATE (2018-2019)

The elasticsearch stack has delivered. We have found it to be very reliable, very scalable, and have not looked back at all. Mongo smelled great at the time, but it is gone for years now, and good riddance to it. We have been running two elastic clusters, one for log data (which replaced our Mongo server), and a second for real application data. Each cluster has 1-2TB of data. It took a lot of architecture work (particularly on the application side) to get elastic to both scale and perform, but deliver it does.

Oh, snap! 10gen has done a remarkable job with documentation, presentations, outreach, and community support. It is especially surprising given how small their team is. — gWaldo, Nov 29 '11 at 13:19
10gen does a great job, however many of the ideas presented in the video mentioned above to optimize indexes are really hacks, that I have not seen documented elsewhere. — Jonesome Reinstate Monica, Nov 29 '11 at 17:02
_`the only person with a constructive response`_ - In my view, this is mostly due the the un-constructive way in which the question was posed (at least, here). It simply tells you that Joe Community on SF doesn't keep an interest in teaching you about mongodb in quite the same way as MongoDb itself :) I'm sure you'll find that if you posted the question as _`Help, our MongoDB doesn't scale! Are we doing it wrong?`_ you'd have received **exactly the same** presentation link **within minutes** of you posting the question. — sehe, Nov 30 '11 at 08:10

score 4 · Answer 3 · answered Nov 29 '11 at 04:08

4

You may not like the answers to your "scaling" problem because you don't actually have a scaling problem; you have a design and implementation problem. You are not indexing efficiently.

Seriously, if you feel that you absolutely must keep indexes of that size, you're going to have the same problem of keeping abominably huge indexes in RAM in any database product you seek out. You would have to buy a high-capacity server (DL380 G7 can make that, and it's a mid-range commodity server; nothing exotic) to store those indexes.

By way of helping, a search for "mongodb optimizing indexes" turns up several useful links:

http://www.mongodb.org/display/DOCS/Optimization

http://www.10gen.com/events/indexingmatters

http://www.deanlee.cn/programming/mongodb-optimize-index-avoid-scanandorder/

http://www.slideshare.net/kbanker/mongo-indexoptimizationprimer

You may get defensive about having done your research, but those of us who use MongoDB in Production know that you are missing many points.

Further, the comment "Bottom line: MongoDB is FOSS, but we gotta spend $$$$$$$ on hardware to run it? We sould rather buy commercial SW!" screams of ignorance and arrogance.

answered Nov 29 '11 at 04:08

gWaldo

11,957
8
42
69

"screams of ignorance and arrogance." ??? No, it means that, at least in this case, FOSS is far from free for production use, because of hardware needs. – Jonesome Reinstate Monica Nov 29 '11 at 17:01
gWaldo: I have used MongoDB for a full year, and I like it a lot. And I have read those docs. That said, we are reviewing our implementation with those docs in mind, and will see if we can achieve further optimization before we decide to leave MongoDB. – Jonesome Reinstate Monica Nov 29 '11 at 17:05
I would also note that the CTO of 10gen, in his video on indexing and scaling, presents a number of approaches that are hacks. They are fine, but they are not "up the middle, clearly documented" implementation proposals. – Jonesome Reinstate Monica Nov 29 '11 at 17:06
@samsmith So what? If the hacks work, why not use them?! – mailq Nov 29 '11 at 22:51
1

@samsmith FOSS "is software that is liberally licensed to grant users the right to use, study, change, and improve its design through the availability of its source code." You still have to provide hardware to run it on. By all means, use commercial software if it meets your needs, but you will *still* have to purchase the hardware in addition to the software... I can't tell if you're trolling, being deliberately obtuse, or really think that downloading a linux distro entitles you to a free PC. – gWaldo Nov 29 '11 at 23:35
Our index is fully optimized, we have spent a lot of time in this area. The issue is that if you have a 15GB index, that is accessed randomly, MongoDB requires at least 15GB of RAM or else performance totally dies. This is a MongoDB specific performance issue. RDBMS systems are able to quickly seek the index on disk, without loading the entire index into memory. MongoDB appears to lack this capability. – Jonesome Reinstate Monica Nov 30 '11 at 03:22
RDBMS's don't have some magical "read quickly from disk" ability that nothing else does. Indices are meant to be in RAM, and reading from disk is slow; those are facts of life. My contention is that you have too much Index for your hardware. You either need to be more selective about what you index, or invest in hardware. But plainly you believe that you have the ULTIMATE MongoDB setup, and nobody could improve upon it. I have tried to help you, but you plainly don't want to listen. I'm done feeding the trolls. Best of luck to you... – gWaldo Nov 30 '11 at 05:36

score 2 · Answer 4 · answered Nov 27 '11 at 23:48

2

Why would you say "SQL Server Express can handle WAY more data on WAY less hardware than Mongo (SQL Server does not meet our architectural desires, or we would use it!)". If you need to change your database architecture (since your other database can't scale like you need it to, and you would use sql server, the answer to me is to fix your database design to work with SQL server. The only thing I can think of that isn't "fixable" is if you truly desire and ACIDless database (which would strike me as odd that debug and transaction log inserts are OK to be dropped)

answered Nov 27 '11 at 23:48

Jim B

24,081
4
36
60

Even then I would use an in memory database (like Redis) as a buffer before transferring the data to SQL Server. There is always a solution, but you have to think first. Telling "we are stuck, help me" is not constructive. I can't even see the requirement... – mailq Nov 28 '11 at 01:09
@maliq The OP already said SQL server was fine. we can certianly argue about if you need to buffer a database buffer but from the OP it seems irrelevant. – Jim B Nov 28 '11 at 02:20
@Jim B -- The issue we are hitting is a common one on Mongo. It sounds like the folks responding here are either not reading my post, or have no real suggestions. Mongo is very very good at what it does, and we like it a lot. We are just not willing to "play the mongo game" RE scaling. We are looking for something that gives us at least some of Mongo's benefits with a more hardware-efficient scaling model. I think our question is pretty clear. – Jonesome Reinstate Monica Nov 28 '11 at 06:07
4

I think your question is pretty clear. You've decided that you do not want to use mongo any more (or you were simply ignorant of the fact that it requires hardware to scale) because your architectural needs changed. Yes this is a typical problem for mongo when you didn't understand that it was designed to be easily scalable via hardware. There is an oreilly scaling book about how to add hardware to a mongodb cluster. Your defined architectural needs were to scale via hardware (thus mongo) - clearly that's changed so it's time to use something else. I suggested sql as you said you'd use it. – Jim B Nov 29 '11 at 01:15

Outgrew MongoDB … now what?

4 Answers4