0

In my project, I have servers that will send ping request to websites, measuring their response time and store it every minute.

I'm going to use Mongodb and i'm searching for best data model. which data model is better?

1- have a collection for each website and each request as a document. (1000 collection)

or

2- have a collection for all websites and each website as a document and each request as sub-document.

Yahia Reyhani
  • 847
  • 1
  • 11
  • 18

2 Answers2

1

You could do either, but I think you will have to factor in periodic growth in database for either case. During the expansion of datafiles database will be slow/unresponsive. (There might be a setting so this happens in the background - I forget ).

A related question - MongoDB performance with growing data structure, specifically the "Padding Factor"

With first approach, there is an upper limit to number of websites you can store imposed by max number of collections. You can do the calculations based on http://docs.mongodb.org/manual/reference/limits/.

In second approach, while #of collection don't matter as much, but growth of database is something you will want to consider.

One approach is to initialize it with empty data, so it takes lasts longer before expanding.

For instance.

{
  website: name,
  responses: [{
     time: Jan 1, 2013, 0:1, ...
  },
  {
     time: Jan 1, 2013, 0:2, ...
  }
  ... and so for each minute/interval you expect. 

]
}

The downside is, it might take you longer to initialize but you will have to worry about this later.

Either ways, it is a cost you will have to pay. The only question is when? Now? or later?

Consider reading their usecases, particularly - http://docs.mongodb.org/manual/use-cases/hierarchical-aggregation/

Community
  • 1
  • 1
Nasir
  • 2,984
  • 30
  • 34
1

Both solutions should face of one certain limitation of mongodb. With the first one, that you said each website a collection, the limitation is in the number of the collections while each one will have a namespace entry and the namespace size is 16MB so around 16.000 entries can fit in. (the size of the namespace can be increased) In my opinion this is a much better solution while you said 1000 collections are expected and it can be handled. (Should be considered that indexes has their own namespace entries and count in the 16.000). In this case you can store the entries as documents you can handle them after generally much easier than with the embedded array.

Embedded array limitation. This limitation in the second case is a hard one. Your documents cannot grow bigger than 16MB. This one is BSON size and it can store quite many things inside documents but if you use huge documents which varies in size , and change size in time your storage will get fragmented. The reason is that will be clear if you watch this webinar . Basically this is the worth what you can do in terms of storage usage.

If you likely to use aggregation framework for further analysis it will be also harder with the embedded array concept.

attish
  • 3,090
  • 16
  • 21
  • thank you. I will use the first solution (each website a collection). when collections reach limitation i can have another database. and if my # of sites grows (i think it takes a year or two) i considering using Cassandra and Hadoop. – Yahia Reyhani Jun 24 '13 at 12:51