I was wondering if you could tell me which NoSQL db or technology/tools should I use for my scenario. We are looking at replacing our OLAP cubes based on SQL server Analysis services with an open source technology coz the data is getting too huge to manage and queries are taking too long to return. We have followed every rule in the book to shard the data, optimize the design of the cube by using aggregations and partitions etc and still some of our distinct count queries take 1-2 mins :( The data size of our fact table is roughly around 250GB. And there are 10-12 dimensions connected in star schema fashion.
So we decided to give open source technologies like Hadoop/HBase/NoSQL dbs a try to see if they can solve our OLAP scenarios with minimal setup and onboarding.
Our main requirements for the new technology are
It has to get blazing fast or instantaneous results for distinct count queries ( < 2 secs)
Supports the concept of measures and dimensions (like in OLAP).
- Support SQL like query language as many of our developers are SQL experts.
- Ability to connect Excel/Tableau to visualize the data.
As there are so many new technologies and tools in the open source world today, I was hoping if you can help me point to the right direction.