8

I want to get last inserted row in Cassandra table. How to get it? Any idea?

I am developing a project for that I am replacing mysql with cassandra. I want to get rid off all sql queries and writing them all in cassandra.

simo
  • 23,342
  • 38
  • 121
  • 218
Mohanish
  • 164
  • 1
  • 3
  • 10
  • In any clustered environment, it is virtually impossible to determine the last inserted entry. Cassandra is no exception. – Jan Dörrenhaus Feb 17 '16 at 14:18
  • For Cassandra people: the concept of the "last inserted row" in MySQL is actually the (autoincrement) id of the row that was just inserted by your client via an INSERT statement. This is a real problem in Cassandra if, for example, your primary key is a `timeuuid` and you use the `now()` function to automatically produce the value for the new row. There's apparently no way to determine the timeuuid of the new row even though that seems like an easy thing to add to the driver. – Shannon Jun 29 '17 at 21:01

2 Answers2

11

Just to impart a little understanding...

As with all Cassandra query problems, the query needs to be served by model specifically designed for it. This is known as query-based modeling. Querying the last inserted row is not an intrinsic capability built into every table. You would need to design your model to support that ahead of time.

For instance, let's say I have a table storing data for users.

CREATE TABLE users (
  username TEXT,
  email TEXT,
  firstname TEXT,
  lastname TEXT,
  PRIMARY KEY (username));

If I were to run a SELECT * FROM users LIMIT 1 on this table, my result set would contain a single row. That row would be the one containing the lowest hashed value of username (my partition key), because that's how Cassandra stores data in the cluster. I would have no way of knowing if it was the last one added or not, so this wouldn't be terribly useful to you.

On the other hand, let's say I had a table designed to track updates that users had made to their account info.

CREATE TABLE userUpdates (
  username TEXT,
  lastUpdated TIMEUUID,
  email TEXT,
  firstname TEXT,
  lastname TEXT,
  PRIMARY KEY (username,lastUpdated))
WITH CLUSTERING ORDER BY (lastUpdated DESC);

Next I'll upsert 3 rows:

> INSERT INTO userUpdates (username,lastUpdated,email,firstname,lastname) 
  VALUES ('bkerman',now(),'bkerman@ksp.com','Bob','Kerman');
> INSERT INTO userUpdates (username,lastUpdated,email,firstname,lastname) 
  VALUES ('jkerman',now(),'jkerman@ksp.com','Jebediah','Kerman');
> INSERT INTO userUpdates (username,lastUpdated,email,firstname,lastname) 
  VALUES ('bkerman',now(),'bobkerman@ksp.com','Bob','Kerman');

> SELECT username, email, dateof(lastUpdated) FROM userupdates;

 username | email             | system.dateof(lastupdated)
----------+-------------------+----------------------------
  jkerman |   jkerman@ksp.com |   2016-02-17 15:31:39+0000
  bkerman | bobkerman@ksp.com |   2016-02-17 15:32:22+0000
  bkerman |   bkerman@ksp.com |   2016-02-17 15:31:38+0000

(3 rows)

If I just SELECT username, email, dateof(lastUpdated) FROM userupdates LIMIT 1 I'll get Jedediah Kerman's data, which is not the most-recently updated. However, if I limit my partition to username='bkerman', with a LIMIT 1 I will get the most-recent row for Bob Kerman.

> SELECT username, email, dateof(lastUpdated) FROM userupdates WHERE username='bkerman' LIMIT 1;

 username | email             | system.dateof(lastupdated)
----------+-------------------+----------------------------
  bkerman | bobkerman@ksp.com |   2016-02-17 15:32:22+0000

(1 rows)

This works, because I specified a clustering order of descending on lastUpdated:

WITH CLUSTERING ORDER BY (lastUpdated DESC);

In this way, results within each partition will be returned with the most-recently upserted row at the top, hence LIMIT 1 becomes the way to query the most-recent row.

In summary, it is important to understand that:

  • Cassandra orders data in the cluster by the hashed value of a partition key. This helps ensure more-even data distribution.
  • Cassandra CLUSTERING ORDER enforces on-disk sort order of data within a partition key.
  • While you won't be able to get the most-recently upserted row for each table, you can design models to return that row to you for each partition.

tl;dr; Querying in Cassandra is MUCH different from that of MySQL or any RDBMS. If querying the last upserted row (for a partition) is something you need to do, there are probably ways in which you can model your table to support it.

Aaron
  • 55,518
  • 11
  • 116
  • 132
1

I want to get last inserted row in Cassandra table. How to get it? Any idea?

It is not possible, what you request is a queue pattern (give me last message in) and queue is a known anti-pattern for Cassandra

doanduyhai
  • 8,712
  • 27
  • 26