Cassandra efficient table walk

Question

I'm currently working on a benchmark (which is part of my bachelor thesis) that compares SQL and NoSQL Databases based on an abstract data model an abstract queries to achieve fair implementation on all systems.

I'm currently working on the implementation of a query that is specified as follows: I have a table in Cassandra that is specified as follows:

CREATE TABLE allocated(
    partition_key int, 
    financial_institution varchar, 
    primary_uuid uuid,
    report_name varchar,
    view_name varchar,
    row_name varchar,
    col_name varchar,
    amount float,
PRIMARY KEY (partition_key, report_name, primary_uuid));

This table contains about 100,000,000 records (~300GB).

We now need to calculate the sum for the field "amount" for every possible combination of report_name, view_name, col_name and row_name.

In SQL this would be quite easy, just select sum (amount) and group it by the fields you want. However, since Cassandra does not support these operations (which is perfectly fine) I need to achieve this on another way.

Currently I achieve this by doing a full-table walk, processing each record and storing the sum in a HashMap in Java for each combination. The prepared statement I use is as follows:

SELECT 
   partition_key, 
   financial_institution,
   report_name, 
   view_name, 
   col_name, 
   row_name, 
   amount 
FROM allocated;

That works partially on machines with lots on RAM for both, cassandra and the Java app, but crashes on smaller machines.

Now I'm wondering whether it's possible to achieve this on a faster way? I could imagine using the partition_key, which serves also as the cassandra partition key and do this for every partition (I have 5 of them).

Also I though of doing this multithreaded by assigning every partition and report to a seperate thread and running it parallel. But I guess this would cause a lot of overhead on the application side.

Now to the actual question: Would you recommend another execution strategy to achieve this? Maybe I still think too much in a SQL-like way.

Thank you for you support.

score 3 · Accepted Answer · answered Jan 19 '14 at 07:47

Here are two ideas that may help you.

1) You can efficiently scan rows in any table using the following approach. Consider a table with PRIMARY KEY (pk, sk, tk). Let's use a fetch size of 1000, but you can try other values.

First query (Q1):

select whatever_columns from allocated limit 1000;

Process these and then record the value of the three columns that form the primary key. Let's say these values are pk_val, sk_val, and tk_val. Here is your next query (Q2):

select whatever_columns from allocated where token(pk) = token(pk_val) and sk = sk_val and tk > tk_val limit 1000;

The above query will look for records for the same pk and sk, but for the next values of tk. Keep repeating as long as you keep getting 1000 records. When get anything less, you ignore the tk, and do greater on sk. Here is the query (Q3):

select whatever_columns from allocated where token(pk) = token(pk_val) and sk > sk_val limit 1000;

Again, keep doing this as long as you get 1000 rows. Once you are done, you run the following query (Q4):

select whatever_columns from allocated where token(pk) > token(pk_val) limit 1000;

Now, you again use the pk_val, sk_val, tk_val from the last record, and run Q2 with these values, then Q3, then Q4.....

You are done when Q4 returns less than 1000.

2) I am assuming that 'report_name, view_name, col_name and row_name' are not unique and that's why you maintain a hashmap to keep track of the total amount whenever you see the same combination again. Here is something that may work better. Create a table in cassandra where key is a combination of these four values (maybe delimited). If there were three, you could have simply used a composite key for those three. Now, you also need a column called amounts which is a list. As you are scanning the allocate table (using the approach above), for each row, you do the following:

update amounts_table set amounts = amounts + whatever_amount where my_primary_key = four_col_values_delimited;

Once you are done, you can scan this table and compute the sum of the list for each row you see and dump it wherever you want. Note that since there is only one key, you can scan using only token(primary_key) > token(last_value_of_primary_key).

Sorry if my description is confusing. Please let me know if this helps.

To #2) I'll probably reimplement it that way. It sounds way better to me than my HashMap implementation, which eventually led me to about 2GB of RAM usage for my JVM yesterday. To #1): I got the concept for the most parts. I'm not sure about Query3 and Query4. Assuming that pk,sk,tk are ordered in a hierarchy, when getting less than 1000 results in Q2, wouldn't I have to jump to the next sk immediately, since all combinations for the same (pk, sk) and a random tk are exhausted ? ....Or maybe I don't get the point Thanks again for you help :) — hoffmax91, Jan 19 '14 at 11:05
"Assuming that pk,sk,tk are ordered in a hierarchy, when getting less than 1000 results in Q2, wouldn't I have to jump to the next sk immediately" When Q2 returns less than 1000, you know that there are no more values for the current pk_val, sk_val combination (all tk_val have been scanned). So, you need to look for the next sk values for the same pk_val, which can be done using Q3. (Sorry for edits, but I didn't realize that hitting enter would post the comment. Also, seems like formatting isn't working in comments.). — M K, Jan 20 '14 at 03:53

Cassandra efficient table walk

1 Answers1