1

In Cassandra, a row can be very long and store units of time relevant data. For example, one row could look like the following:

RowKey: "weather"
name=2013-01-02:temperature, value=90, 
name=2013-01-02:humidity, value=23, 
name=2013-01-02:rain, value=false",
name=2013-01-03:temperature, value=91, 
name=2013-01-03:humidity, value=24, 
name=2013-01-03:rain, value=false",
name=2013-01-04:temperature, value=90, 
name=2013-01-04:humidity, value=23, 
name=2013-01-04:rain, value=false".

9 columns of 3 days' weather info. time is a primary key in this row. So the order of this row would be time based.

My question is, is there any way for me to do a query like: what is the last/first day's humidity value in this row? I know I could use a Order By statement in CQL but since this row is already sorted by time, there should be some way to just get the first/last one directly, instead of doing another sort. Or is cassandra optimizing it already with Order By statement under the hood?

Another way I could think of is, store another column in this row called "last_time_stamp" that always updates itself as new data is inserted in. But that would require one more update every time I insert new weather data.

Thanks for any suggestion!:)

1 Answers1

0

Without seeing more of your actual table, I suggest using a timestamp (or timeuuid if there is a possibility for collisions) as the second component in a compound primary key. Using this, you can get the last "row" by selecting ORDER BY t DESC LIMIT 1.

You could also change the clustering order in your schema to order it naturally for "last N" queries.

Please see examples and linked resource in this answer.

Community
  • 1
  • 1
Adam Holmberg
  • 7,245
  • 3
  • 30
  • 53