5

For a complex real-time Apache Storm topology I need aggregates of my data (stored in CassandraDB) for some computation steps. So far the data is queried when needed with CQL (Cassandra Query Language) and aggregated in a Storm bolt. That is a bit slow, so we want to have the data needed for the aggregation cached. Two option are on the table:

  • Put the data needed in an indexed Ignite Cache and sliding-window-query it from Storm. In this case we would only need one Cache and use different queries, depending on the aggregation.
  • Put the data in Cassandras in-memory, off-heap cache.

Argument for Ignite: We only need one indexed cache, while we would need one Cassandra table for each aggregation, for fast access. (Also ACID, but obviously we already live with CAP, so not a strong argument for our architects.)

Argument for Cassandra: We don't need to introduce a new technology.

But: What about speed? How fast would an indexed Ignite cache be compared to an optimized (= own table for each query) in-memory Cassandra?

Make42
  • 12,236
  • 24
  • 79
  • 155

1 Answers1

0

I believe that in-memory indexed SQL in Ignite would be faster than Cassandra CQL queries. Apache Ignite is ANSI-99 SQL compatible, so you should be able to do all sorts of aggregations, joins, order by, group by, etc.

I will raise a point within the Ignite community to see if Cassandra CQL could be benchmarked against Ignite SQL. When done, will post the results here.

Dmitriy
  • 2,282
  • 1
  • 11
  • 7
  • If I have to aggregate and are able to do this on the database's side, this would speed up the whole fetch (less data over the connection), but what if I just want the data without aggregation? What technical reason would there be for a faster Ignite? – Make42 Nov 28 '15 at 17:54
  • 1
    Ignite In-Memory Data Fabric generally solves performance and scalability problems. If you don't need any of the Ignite SQL or caching features, and are happy with disk database performance, then I don't think there is a need for a switch. – Dmitriy Nov 30 '15 at 05:15
  • First of all, thank you for answering so reliable. Back to topic: I am not happy with disk speed, but Cassandra would not run on disk, but in memory. Usually Cassandra is on disk, true, but one is able to cache data in memory. That is what I would do and this is what I compare Ignite Cache to (see my question). So in my setting I DO need my data to be in-memory (provided by both Cassandra and Ignite Cache), but I do NOT need Ignite's in-database aggregation capabilities, but just the fetch. My question is how those two technologies would compare under those circumstances. – Make42 Nov 30 '15 at 09:57
  • Any progress on this front? Have there been any benchmarks done? – Make42 Dec 14 '16 at 09:30
  • There are many cases when Ignite was used as a caching layer for Cassandra and provided an order of magnitude performance boost for both, writes and reads. On top of that you will have a benefit of secondary indexes and in-memory query performance, including distributed joins. More on Ignite Cassandra integration here: https://apacheignite-mix.readme.io/docs/ignite-with-apache-cassandra – Dmitriy Dec 14 '16 at 22:38
  • 1
    @Make42 the benchmarks were eventually published: https://dzone.com/articles/apachereg-ignite-and-apachereg-cassandra-benchmark – dmagda Mar 29 '18 at 14:57