2

I use MongoDb in which data changes ( updates ) frequently, - every minute. The data is taken from MongoDB thought third party API application via HTTP. Also in that API data is additionaly agregrated before they are returned, for example counted last X days views sum for page N.

Constantly increasing data amount ( i.e. few of these collections are from 6 GB to 14 GB ) in some cases occurred 2 - 7 seconds delays till API returns aggregated data. Mentioned delay for web application is big enought. I want to reduce these delays somehow.

Which models are used in my described situations? Maybe first of all i should descline that HTTP API idea and move all API logic to server side?

Own ideas, considerations:

Maybe there should be two seperated data "proccessors":

1) First "proccessor" should do all aggregation jobs and just write to second one.

2) Second "proccessor" all data justs returns without any internal calculations, aggregations.

But also there can be bootleneck when the first writes to second data store, there should be the logic to update new and old data which also impacts the performance..

Community
  • 1
  • 1

1 Answers1

2

That third-party application seems to do a bad job, therefore you should drop it. Probably you can fix your problems by refactoring the data model or using better aggregation algorithms.

Pre-calculations

Using a batch processor and a real-time processor sounds like a good idea, but I think you won't need it yet (see below). If you still want to implement it, you should read about Lambda architecture, because it fixes some problems your approach might have.

This approach to architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate precomputed views, while simultaneously using real-time stream processing to provide dynamic views. The two view outputs may be joined before presentation.

Data Model (6 rules of thumb)

You're saying that there are a lot of updates, this is a red flag when using MongoDB. Some kind of updates could slow down MongoDB, because of its distributed nature. For example try to insert subdocuments, instead of updating fields. But this isn't an exact science, therefore I can't help without seeing the data model.

Aggregation Framework

Databases are made for data, so move data aggregation into MongoDB. Map Reduce is slow on MongoDB, thus use the Aggregation Framework.

Community
  • 1
  • 1
Christian Strempfer
  • 7,291
  • 6
  • 50
  • 75
  • I can't get into more details, without seeing the third-party application code and its data model. – Christian Strempfer Nov 16 '14 at 12:42
  • That third-party application is used to decide which algorithm to use. Or for example just retrieve data by method name. The main idea of that application was to separate data-aggregation logic from WEB logic it self. Is there any models for that approach? Maybe for data-model i should try to move all calculations to Hadoop and just aggregation results push ( stream ) to MongoDB? At this time most of calculations are carried out by MongoDB it self by executing JS files and i must admit it also hard to manage many of these. – deividaspetraitis Nov 16 '14 at 13:31
  • @qutwala: So the 3rd party app is basically the data layer, then it's ok. Switching to Hadoop is an option (Lambda architecture). But you could first try to use the aggregation framework instead of executing JS files, it's much faster because it's native code and doesn't need to parse JS. – Christian Strempfer Nov 16 '14 at 13:40
  • that's right, it's just a layer. What do you mean to use aggregation framework? Currently i have a cron which runs an JS, and it's inside JS code which basically performs few map reduce jobs. And later in that 3rd party app runs on demand required aggregation tasks from PHP to Mongo and returns results. – deividaspetraitis Nov 16 '14 at 13:50
  • Map Reduce is slow, see this [link](http://stackoverflow.com/q/13908438/199048). The aggregation framework is the new much faster way to process data ([example](http://docs.mongodb.org/manual/tutorial/aggregation-zip-code-data-set/)). – Christian Strempfer Nov 16 '14 at 13:57
  • thank you very much for explanations and answers, i'll take a look and try to optimize first of all in that way! – deividaspetraitis Nov 16 '14 at 14:05
  • It does and i will, but at this time for vote up i need at least 15 reputation. :) – deividaspetraitis Nov 16 '14 at 15:39
  • your link to the "6 rules of thumb" was a GODSEND! thank you this was exactly what I needed to read. i read all three parts. i think it would be useful to post this link in a lot of threads...thanks – FireDragon Sep 22 '15 at 01:26