1

I am trying to design the application log table in Cassandra,

CREATE TABLE log(
  yyyymmdd varchar, 
  created timeuuid,  
  logMessage text,
  module text, 
  PRIMARY KEY(yyyymmdd, created)
);

Now when I try to perform the following queries it is working as expected,

select * from log where yymmdd = '20182302' LIMIT 50;

Above query is without grouping, kind of global.

Currently I did an secondary index for 'module' so I am able to perform the following,

select * from log where yymmdd = '20182302' WHERE module LIKE 'test' LIMIT 50;     

Now my concern is without doing the secondary index, Is there an efficient way to query based on the module and fetch the data (or) Is there a better design?

Also let me know the performance issue in current design.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Harry
  • 3,072
  • 6
  • 43
  • 100

1 Answers1

2

For fetching based on module and date, you can only use another table, like this:

CREATE TABLE module_log(
  yyyymmdd varchar, 
  created timeuuid,  
  logMessage text,
  module text, 
  PRIMARY KEY((module,yyyymmdd), created)
);

This will allow to have single partition for every combination of the module & yyyymmdd values, so you won't have very wide partitions.

Also, take into account that if you created a secondary index only on module field - you may get problems with too big partitions (I assume that you have very limited number of module values?).

P.S. Are you using pure Cassandra, or DSE?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • I thought of this solution, But It restricted it for query : 'select * from log where yymmdd = '20182302' LIMIT 50; ', Because I need a consolidated module logs too :( – Harry Feb 22 '18 at 10:14
  • Yes I use pure apache Cassandra – Harry Feb 22 '18 at 10:14
  • As we discussed before, even If I have a secondary index, If I query by partitionId, It will directly go to the partition node and then look for the secondary index table right? Also In one day we hardly cross 100K logs – Harry Feb 22 '18 at 10:21
  • 1
    It's slightly more complicated - it still may hit several machines to find values. This blog post describes secondary indices in more details: https://www.datastax.com/dev/blog/cassandra-native-secondary-index-deep-dive – Alex Ott Feb 22 '18 at 10:34
  • Considering your solution, Is there a way to read all the consolidated module logs? – Harry Feb 22 '18 at 10:42
  • It's really depends on your requirements - you can insert into multiple tables from your app, or insert into one table and use secondary indices, or materialized views to query on another combinations of columns, or even use SASI indices (http://www.doanduyhai.com/blog/?p=2058). All of this depends on the requirements for latency, etc. - how often these data will be accessed, for example? – Alex Ott Feb 22 '18 at 10:47
  • It is a log so It will be queried frequently – Harry Feb 22 '18 at 11:27
  • 1
    But also, what is latency required - loading it for user console, or something like? If it's really requires very low latency, then having several tables tailored for it, may work better. – Alex Ott Feb 22 '18 at 11:29
  • Please check this : https://stackoverflow.com/questions/49135405/manual-pagination-in-cassandra – Harry Mar 07 '18 at 04:08
  • I need your help in this : https://stackoverflow.com/questions/49219277/chinese-language-in-cassandra – Harry Mar 11 '18 at 12:03