Manage reports, when our database is Cassandra ...Spark or Solr...or BOTH?

Question

My db is Cassandra (datastax enterprise => linux). Since it doesn't support group-by, aggregate and etc. for reporting, according to its fundamentals, it's not a good decision to use Cassandra, downright. I googled about this deficit and found some results as this, and this and also this one.

But I really became confused! Hive uses additional tables, individually. Solr is better for full-text searching and like that. And Spark...it's useful for analysis, but, I didn't understand if it uses Hadoop eventually, or not.

I will have many reports, which needs indexing and grouping, at least. But I don't want to use additional tables which will impose overhead. And also, I'm .Net (and not Java) developer, and my application is besed on .Net Framework, too.

score 1 · Answer 1 · answered Mar 09 '16 at 13:26

I am not exactly sure what your question is here and your confusion is understandable as with Cassandra and DSE there is a lot going on.

You are correct in stating that Cassandra does not support any aggregations or group by functionality that you would want to use for reporting.
Solr (DSE Search) is used for ad-hoc and full text searching of the data stored in Cassandra. This only works on a single table at a time.
Spark (DSE Analytics) provides analytics capabilities such as Map-Reduce as well as the ability to filter and join tables. This is not done in real-time though as the processing and shuffling of data can be expensive depending on the data load.
Spark does not use Hadoop. It performs many of the same jobs but is more efficient in many scenarios as it allows for in-memory distributed processing on the data.

Since you are using DataStax Enterprise the advantage is that you have built in connectors to both Solr (DSE Search) to provide ad-hoc queries and Spark (DSE Analytics) to provide analytics on your data.

Since I don't know your exact reporting requirements it is difficult to give you a specific recommendation. If you can provide some additional details about what sort of reporting (scheduled versus ad-hoc etc.) you will be running I may be able to help you more.

The app. has many users. It will have Scheduled and ad-hoc reporting, both. It should make some reports periodically: daily, per month, every four months, and per year, for the admin And also, should be able to support ad-hoc reporting request, for users. — Elnaz, Mar 09 '16 at 14:22

Manage reports, when our database is Cassandra ...Spark or Solr...or BOTH?

1 Answers1