1

I can't discuss things in great detail due to an NDA, but I'm hoping an overview of the system being built can help you in aiding me in making a decision concerning our databases.

I'm building an app that will help vendors compete to gain clientele by making strategic offers based on records of inventory/purchase from the storefronts.

One side of the app is for the store owners to see presented offers, network, etc. I've got that going with a standard php/MySQL setup.

My question is concerning the records of inventory. We are talking millions of records here nearly immediately. The sample data I'm using is roll up of four of their managers (they have dozens) over the course of a year or two and it had over 500k rows with about 30 or more columns. When we get scores of stores with all of their managers it will be massive, at least compared to anything I've worked with as of yet.

The vendors will have a side of the product in which they can search through these records and make competitive offers based off of it.

Is the sheer size a good reason to use something like mongo? Or is it more a matter of how the data is laid out / what it consists of? Or some other element that I'm not considering?

And, if not mongo/nosql, then is there some other methodology or technology that such large data stores would benefit from me using (sharding, amazon cloud database, etc).

Thanks

John Blythe
  • 603
  • 1
  • 7
  • 21
  • 1
    Very good question - very worthwhile pondering and answering if we want to write a book on Mongo and/or MySQL. From the programming perspective, it is possible, but the query language differences may prove to be confusing and unproductive. According to your description of the application space, y would use Mongo. – mozillanerd Mar 03 '13 at 04:09

1 Answers1

2

Answers ...

Q: Is the sheer size a good reason to use something like mongo?

A: I think so. Mongo was built from the ground up to scale in a massive way. You have replica sets and sharding that can help you scale. They also have features to make sure your data gets stored in the appropriately geographically distributed data centers.

Q: Or is it more a matter of how the data is laid out / what it consists of?

A: Mongo is a document database and you're right, the data models will be different. You have to think of data in a denormalized way instead of normalized. Just like any technology, there are pros and cons to storing things as documents.

Some pros: Schema management is a breeze. Data more naturally fits objects in your application. Don't have to pay the price of complicated/slow joins.

Some cons: Schemas can be inconsistent - you have to manage it. Data is repeated, which is not managed means it can become inconsistent.


In general I think Mongo would be a good choice to deal with that scale. Mongo has a new aggregation framework that brings a lot of SQL concepts to queries on documents. Easier to make complex queries. Also Mongo has map/reduce to run any kind of query you might have.

After using Mongo daily for about a year, I've really enjoyed the support around it as a product and the general ease of setting it up and working with it.

ryan1234
  • 7,237
  • 6
  • 25
  • 36
  • great feedback, thanks! i mentioned briefly above the search component of this massive data store. in the reading i've done the last day or two i've seen quite a few people say that mongo isn't all that great on full text searching. are the new modules, frameworks, etc. helping resolve such issues? – John Blythe Mar 04 '13 at 00:50
  • They are coming out with full text search in their next version. I believe the 2.4 release candidate has this. http://docs.mongodb.org/manual/release-notes/2.4/#text-indexes. I think you can download it and play with it. Alternatively when you populate Mongo with documents (write them), you can also duplicate the message and send it to Solr or ElasticSearch. Or you can create a batch job to extract data from Mongo to Solr/Elastic on a schedule. – ryan1234 Mar 04 '13 at 02:27
  • Could you elaborate on that a bit more, elastic and solr that is. I've seen people discuss them but haven't yet followed the breadcrumbs to see what they are quite yet. – John Blythe Mar 04 '13 at 03:07
  • Both Elastic Search (Amazon) and Solr are built on top of the Lucene search engine. http://lucene.apache.org/ The nice thing about both of them is that you can easily insert JSON documents to them. Then you can pick certain fields to search on and even give certain fields a "boost" to help influence search results. Check out elastic search - http://www.elasticsearch.org/ – ryan1234 Mar 04 '13 at 03:18