1

I have 100 million rows in cassandra's table. The schema is: id int, key varchar, row_hash varchar, version int and the PK is: ((version), id). The query to create this schema is:

c_sql = "CREATE TABLE IF NOT EXISTS {} (id varchar, version int, row_hash varchar, PRIMARY KEY((version), id))".format( self.table_name )

Does this statement make the version as the Partition key?

Also, my select query which is apparently taking a long time as #rows keep on increasing is:

row_check_query = "SELECT {} FROM {} WHERE {}={} AND {}='{}' ".format( "row_hash", self.table_name, "version", self.version, "id", key )
aviral sanjay
  • 953
  • 2
  • 14
  • 31
  • For your first question I would take a look [here](https://stackoverflow.com/questions/24949676/difference-between-partition-key-composite-key-and-clustering-key-in-cassandra). – LemonPy Jan 22 '19 at 11:44
  • @IftahP I actually wrote this code looking at that answer :p Could you evaluate my concepts? Thanks! – aviral sanjay Jan 22 '19 at 12:02

1 Answers1

2

Yes, version is the partition key. id is a clustering column in your case.

You can use CQL Tracing to analyze your performance issues - https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshTracing.html

Depending on your data distribution you might get into a "wide row" scenario, having many records in a single version partition, having to read a very arge partition can take time.

Oded Peer
  • 2,377
  • 18
  • 25
  • 1
    it might, that depends on the client making the request. It will consume a lot of resources on the database. – Oded Peer Jan 22 '19 at 17:35
  • 1
    If you are querying by the primary key columns uing equality, why are you using a compound key with a clustering column? You should model your table to match your queries, in this case I would use both "version" and "id" as a composite primary key, a single partition. – Oded Peer Jan 22 '19 at 17:39
  • So, if my version has, say, 1 billion rows and there can be millions of versions, what data model would you suggest? Yes, I will only query by version and id. – aviral sanjay Jan 22 '19 at 19:14
  • Use a composite key, having two columns as the partition key - CREATE TABLE IF NOT EXISTS {} (id varchar, version int, row_hash varchar, PRIMARY KEY(version, id)) – Oded Peer Jan 23 '19 at 08:19