I am working on the design & implementation of a (near) real-time web-analytics engine. This is similar to Google Analytics and ChartBeat. Nearly 150M requests/day are expected. We have an availability of 5 to 8 machines with 2.5GHz (8 core) CPU and 16 GB of RAM each.
I am looking at horizontally scalable solutions for this requirement. Currently, I am analyzing mongo-hadoop combination for this purpose. From what I have understood till now is that it would be difficult to keep all the data at one place (one machine) for analysis. So, Hadoop as data processor and MongoDB as data storage is appearing a good combination to me.
Is there a standard or (I should say) a proven architecture for this kind of an application? What are the design considerations I should take? Is mongo-hadoop combination working for somebody?