0

I am new with CQL, please help!

I am trying to find the " Which URL on the website has been accessed the most? How many accesses were made on it?" from a Table, I have created.

The IP values are saved as text here.

To solve the above question, I am trying to use functions to get an aggregate of all the common ips and then pick the maximum one. This is the approach I have in mind and I am refering to http://christopher-batey.blogspot.com/2015/05/cassandra-aggregates-min-max-avg-group.html to understand writing the functions.

It is printing sum for all the urls wherease I am just looking for the maximum one.

Snip of How I am calling it

snip of a part of output

Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23
krk
  • 37
  • 4

1 Answers1

1

This isn't a good thing to do in Cassandra. It won't scale as your dataset/cluster grows because it requires a full table scan.

For analytics workloads, we recommend that you use Spark with the spark-cassandra-connector since it will optimise the CQL queries. Cheers!

Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23
  • So you're saying that regardless of the index here a columnar DB would still require additional tools to compute aggregates? – Alex Klaus Apr 08 '22 at 04:06
  • Cassandra isn't a columnar database. It's designed for OLTP workloads so analytics queries need to be optimised using Spark + the Spark connector. Cheers! – Erick Ramirez Apr 08 '22 at 04:46