0

I have cassandra table containing information about ship identities. It has column: imo_number, mmsi_number, ship_name, timestamp.

I have 2 requirement for this table:

  1. I want this table to only save row with last updated records, according to column imo_number, mmsi_number, and ship_name. So if any new record with value of those column is exacly same, it will be updated. As I know, I need to define those column as primary key in cassandra.
create table keyspace.table (
    mmsi_number text,
    imo_number text,
    ship_name text,
    timestamp timestamp,
    primary key ((mmsi_number), imo_number, ship_name)
);
  1. I want to be able to load data from table by mmsi as where clause, and sort it by timestamp to find the latest record, or find records between a date range. As I know, the CQL schema will be:
create table keyspace.table (
    mmsi_number text,
    imo_number text,
    ship_name text,
    timestamp timestamp,
    primary key ((mmsi_number), timestamp, imo_number, ship_name)
) with clustering order by (timestamp desc);

The problem is, when I use my second schema, my first requirement will not be fulfilled. Because every new record will have different timestamp, so it will be inserted instead updated.

How do I fulfilled my requirements above? Or maybe I did something wrong? Help appreciated.

panoet
  • 3,608
  • 1
  • 16
  • 27
  • how many unique combinations are for imo_number, mmsi_number, and ship_name ? – Alex Ott Feb 17 '20 at 09:47
  • I can say, unlimited. Because those data come from sensor that alive 24 hours / 7 days. And actually it is human input, so we can't control it. – panoet Feb 18 '20 at 01:34

0 Answers0