2

I have some Cassandra related questions:

I have to store some data (about 10M rows) (let's say a natural key - sortable, update timestamp, createDate (YYYYMMDD only) and a value field. I plan to create the following CF

CREATE TABLE data (
  id text,
  createdate text,
  updatedate timeuuid,
  value text,
  PRIMARY KEY (id, updatedate)
);

CREATE TABLE data_createdate (
  id text,
  createdate text,
  value text,
  PRIMARY KEY (id, createdate)
);

My usage query will be like:

  • get all rows (id, value, createdate, updatedate), so CQL like this will do SELECT * FROM data

I am using Astyanax, how do I do paging? Do I have to enable partitioner as order-preserved, so I can use token(id) in a range value to page through.

  • get all rows with a updatedate range, so CQL like this will do SELECT * FROM data where updatedate > startdate and updatedate < enddate

Again, how do I do paging?

  • get all rows with a createdate range, it's similar to the above question, but I can run CQL against data_createdate CF. Again, how do I do paging?

Any suggest and comments? Thanks a lot.

Theo
  • 131,503
  • 21
  • 160
  • 205
Mike Wang
  • 41
  • 4

2 Answers2

1

If you want to achieve paging, then try to store last key from the last retrieved set, so that when next time, you want to get the next page slice, your query's entry point will be last saved key. Will suggest you to go through this link http://www.datastax.com/docs/1.2/cql_cli/using/paging.

abhi
  • 4,762
  • 4
  • 29
  • 49
1

In general you want to avoid anything that requires iterating over all keys in a column family. Just as in an RDBMs you should only do queries that have proper indexes set up.

Since updatedate is part of the compound row key for the data table you can use range queries on that column to do paging (exactly how to do paging in Cassandra is a pretty complex topic, unfortunately). This means that your two first use cases are actually the same.

I'm not really sure what you mean by the third case, do you mean that you want to query rows in data with a range query on createdate -- e.g. SELECT * FROM data WHERE createdate > '20130206' AND createdate < '20130228'? I'm confused by your second table (data_createdate) and where it fits in.

If you mean what I think you mean one solution could be to add a secondary index to the createdate column of data (CREATE INDEX data_createdate_index ON data (createdate)). You can read more about secondary indexing in the documentation.

Theo
  • 131,503
  • 21
  • 160
  • 205