0

I am working on the design & implementation of a (near) real-time web-analytics engine. This is similar to Google Analytics and ChartBeat. Nearly 150M requests/day are expected. We have an availability of 5 to 8 machines with 2.5GHz (8 core) CPU and 16 GB of RAM each.

I am looking at horizontally scalable solutions for this requirement. Currently, I am analyzing mongo-hadoop combination for this purpose. From what I have understood till now is that it would be difficult to keep all the data at one place (one machine) for analysis. So, Hadoop as data processor and MongoDB as data storage is appearing a good combination to me.

Is there a standard or (I should say) a proven architecture for this kind of an application? What are the design considerations I should take? Is mongo-hadoop combination working for somebody?

Community
  • 1
  • 1
dvl
  • 741
  • 2
  • 8
  • 19

1 Answers1

2

I assume you have already read this?

http://www.mongodb.org/display/DOCS/Hadoop+Quick+Start

More details and working examples for sharded configuration here - http://www.slideshare.net/spf13/mongodb-and-hadoop

Calvin Cheng
  • 35,640
  • 39
  • 116
  • 167
  • Yes, I have read the documentation. But this does not mention whether the adapter works fine for sharded configuration as well. I am looking for more details than a short example. – dvl Nov 18 '12 at 11:35
  • Yes it does work with sharded configuration. Good demo examples here - http://www.slideshare.net/spf13/mongodb-and-hadoop – Calvin Cheng Nov 18 '12 at 11:36