2

I'm totally new to Mongo, and decided to give it a try with my dateset, which I store now in MySQL. I have started with a single table, which turns out to be document collection now.

Each document has about 30 properties and each property is generally a short string. So in MySQL I have varchar fields, and here in Mongo it's just BSON.

I was really surprised when I found out that the database size I got in Mongo almost 7 times bigger than mysql innodb. With 1,500,000 records/documents it is about 350 Mb in MySQL and 2.3 Gb in Monbo since it stores property name for each single piece of data.

Is there a way to come up with 'schema' and normalize data to reduce the database size?

UPD: I use MongoDB 3.4, and since db.serverstatus().storageEngine.name is mmapv1, I suppose I use MMap.

Sample document:

{
  "_id" : ObjectId("595c4f4342ce99299c19c379"),
  "someNumber" : "11420",
  "companyDataFromSource1" : {
    "lolNumber" : "11420",
    "businessName" : "Feather mighty shop Inc",
    "businessOwner" : "",
    "businessHealthCode" : "Q",
    "realUpdateDate" : "13-OCt-16",
    "someNumericInfo" : "10000",
    "yearWhenItHappened" : "2014",
    "moreDate" : "22-Jan-02",
    "anotherStatus" : "DE",
    "numValue1" : "1",
    "numValue2" : "1",
    "someProp" : "E",
    "anotherProp" : "R",
    "street" : "BB251 Database St",
    "city" : "Denver",
    "state" : "CO",
    "zip" : "12345",
    "country" : "US",
    "anotherStreet" : "TT251 Server St",
    "anotherCity" : "Sratford",
    "anotherState" : "WI",
    "anotherZip" : "54484",
    "anotherCountry" : "US",
    "telephone" : "(123) 481-2222",
    "fax" : "",
    "emailAddress" : "qwwqwwqeewe@hotmail.com",
  }
}

db.collecion('collection').stats results:

{
  "ns" : "db.collection",
  "size" : 1507059792.0,
  "count" : 1495099,
  "avgObjSize" : 1008,
  "numExtents" : 19,
  "storageSize" : 1580150784.0,
  "lastExtentSize" : 415174656.0,
  "paddingFactor" : 1.0,
  "paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
  "userFlags" : 1,
  "capped" : false,
  "nindexes" : 1,
  "totalIndexSize" : 60518752,
  "indexSizes" : {
      "_id_" : 60518752
  },
"ok" : 1.0
}
Dmitry Samoylov
  • 1,228
  • 3
  • 17
  • 27
  • what's your version of mongodb ? Which engine are you using (WiredTiger, MMap, ... ? ) What's the compression level ? can you provide also a sample document from you collection, and the output of `db.collection.stats()` command? – felix Jul 05 '17 at 07:13
  • @fellix I have updated the question – Dmitry Samoylov Jul 05 '17 at 07:28
  • Possible Duplicate of [Is shortening MongoDB property names worthwhile?](https://stackoverflow.com/questions/12790861/is-shortening-mongodb-property-names-worthwhile) – Neil Lunn Jul 05 '17 at 10:22

2 Answers2

2

A few hints on how to reduce your database size:

Database engine

First, switch from MMApv1 to WiredTiger storage engine, as it's way more efficient to compress data. The default compression level is snappy, but you can compress even more using zlib with few performance trade off. See Mongodb 3.0 for details.

To move to WiredTiger, see this tutorial from MongoDB documentation: https://docs.mongodb.com/manual/tutorial/change-standalone-wiredtiger/

Reduce key size

Second thing is to reduce your documents size (currently ~1008 bytes according to the avgObjSize field ). To do this, use smaller keys (ideally, two letter long keys). for example,

companyDataFromSource1 ==> c1

The benefits from this will be small with compression enabled, but this is always a good practice as it reduce the size of data that will be send to your app / over network

felix
  • 9,007
  • 7
  • 41
  • 62
  • Thanks a lot! Switching to WiredTiger reduced the size of database files to 417 Mb, which is at least comparable to MySQL database size. – Dmitry Samoylov Jul 05 '17 at 08:28
1

Is there a way to come up with 'schema' and reduce the database size?

No. Mongodb is schemaless, that is one of its core features. You could use shorter names for your properties. Or use a compressing storage engine (WiredTiger).

Sergio Tulentsev
  • 226,338
  • 43
  • 373
  • 367