16

I am trying to store records with a set of doubles and ints (around 15-20) in mongoDB. The records mostly (99.99%) have the same structure.

When I store the data in a root which is a very structured data storing format, the file is around 2.5GB for 22.5 Million records. For Mongo, however, the database size (from command show dbs) is around 21GB, whereas the data size (from db.collection.stats()) is around 13GB.

This is a huge overhead (Clarify: 13GB vs 2.5GB, I'm not even talking about the 21GB), and I guess it is because it stores both keys and values. So the question is, why and how Mongo doesn't do a better job in making it smaller?

But the main question is, what is the performance impact in this? I have 4 indexes and they come out to be 3GB, so running the server on a single 8GB machine can become a problem if I double the amount of data and try to keep a large working set in memory.

Any guesses into if I should be using SQL or some other DB? or maybe just keep working with ROOT files if anyone has tried them?

Community
  • 1
  • 1
xcorat
  • 1,434
  • 2
  • 17
  • 34
  • This question can mean different things depending on your situation. 1. If you are new to Mongo, most likely the answer you need is the un-accepted answer by lix. 2. The actual question is about compression. So if you are using an old mongo, or somehow doesn't have compression, take a look at Mongo compression engines. – xcorat Mar 07 '21 at 02:54

2 Answers2

37

Basically, this is mongo preparing for the insertion of data. Mongo performs prealocation of storage for data to prevent (or minimize) fragmentation on the disk. This prealocation is observed in the form of a file that the mongod instance creates.

First it creates a 64MB file, next 128MB, next 512MB, and on and on until it reaches files of 2GB (the maximum size of prealocated data files).

There are some more things that mongo does that might be suspect to using more disk space, things like journaling...

For much, much more info on how mongoDB uses storage space, you can take a look at this page and in specific the section titled Why are the files in my data directory larger than the data in my database?

There are some things that you can do to minimize the space that is used, but these tequniques (such as using the --smallfiles option) are usually only recommended for development and testing use - never for production.

Lix
  • 47,311
  • 12
  • 103
  • 131
  • 6
    +1 because this is a thorough explanation which deserves credit and @xcorat has not marked this as the accepted answer. – tandrewnichols Sep 09 '14 at 14:02
  • Thanks for the answer, but it DOES NOT answer the question specifically. This overhead is not due to preallocation or journaling but is how the data is stored! I mentioned the data size is ~13GB, which is still 6x larger than root files! – xcorat Apr 17 '15 at 02:52
  • 1
    The new mongoDB versions has built in compression engines, and does a very good job of compressing similar data. So consider the main part of this question relevant only for previous versions. – xcorat Oct 11 '16 at 18:01
4

Question: Should you use SQL or MongoDB?

Answer: It depends.

Better way to ask the question: Should you use use a relational database or a document database?

Answer:

  • If your data is highly structured (every row has the same fields), or you rely heavily on foreign keys and you need strong transactional integrity on operations that use those related records... use a relational database.
  • If your records are heterogeneous (different fields per document) or have variable length fields (arrays) or have embedded documents (hierarchical)... use a document database.

My current software project uses both. Use the right tool for the job!

Dan H
  • 14,044
  • 6
  • 39
  • 32