I am developing an application for sales force. I am not able to figure out how to manage big data in my application. Following are the scenarios.
I have location divided based on following criteria.
Country => State => City => Territory => Area => Outlet.
My table structure to manage daily sales is roughly detailed below.
Outlet ID - 1,2,3,4,5,6 ...
User ID - EMP001,EMP002,EMP003,EMP004,EMP005,EMP006 ...
Product ID - 78,54,21,11,09,83 ..
Quantity - 12,34,67,43,70,03 ..
Date & Time - 01/05/2014 – 11.00,01/05/2014 – 12.00,01/05/2014 – 14.00 ..
and other filelds. Based on the above data structure there will be many reports which will be viewed on real-time basis.
We have 1 million row insertion on daily basis. I have narrowed on Casandra as the NO-SQL database.
Now i need a database which can query and mange real-time analytics. Heard and read about these Open Source tools like - Hbase,Pig, Hive, Presto DB, Impala, Sharp, Shark etc.
Currently i am not able to judge which is the best to go with my application for real-time analytics and forcasting product sale.
Your help and guidance will be highly appreciated.
Thanks