1

My db is Cassandra (datastax enterprise => linux). Since it doesn't support group-by, aggregate and etc. for reporting, according to its fundamentals, it's not a good decision to use Cassandra, downright. I googled about this deficit and found some results as this, and this and also this one.

But I really became confused! Hive uses additional tables, individually. Solr is better for full-text searching and like that. And Spark...it's useful for analysis, but, I didn't understand if it uses Hadoop eventually, or not.

I will have many reports, which needs indexing and grouping, at least. But I don't want to use additional tables which will impose overhead. And also, I'm .Net (and not Java) developer, and my application is besed on .Net Framework, too.

Elnaz
  • 2,854
  • 3
  • 29
  • 41

1 Answers1

1

I am not exactly sure what your question is here and your confusion is understandable as with Cassandra and DSE there is a lot going on.

  • You are correct in stating that Cassandra does not support any aggregations or group by functionality that you would want to use for reporting.
  • Solr (DSE Search) is used for ad-hoc and full text searching of the data stored in Cassandra. This only works on a single table at a time.
  • Spark (DSE Analytics) provides analytics capabilities such as Map-Reduce as well as the ability to filter and join tables. This is not done in real-time though as the processing and shuffling of data can be expensive depending on the data load.
  • Spark does not use Hadoop. It performs many of the same jobs but is more efficient in many scenarios as it allows for in-memory distributed processing on the data.

Since you are using DataStax Enterprise the advantage is that you have built in connectors to both Solr (DSE Search) to provide ad-hoc queries and Spark (DSE Analytics) to provide analytics on your data.

Since I don't know your exact reporting requirements it is difficult to give you a specific recommendation. If you can provide some additional details about what sort of reporting (scheduled versus ad-hoc etc.) you will be running I may be able to help you more.

bechbd
  • 6,206
  • 3
  • 28
  • 47
  • The app. has many users. It will have Scheduled and ad-hoc reporting, both. It should make some reports periodically: daily, per month, every four months, and per year, for the admin And also, should be able to support ad-hoc reporting request, for users. – Elnaz Mar 09 '16 at 14:22