what is right practice to use Cassandra table as key-value Pair. And to perform bulk read and write on this Table

Question

I want to create a key-value datastore where key is a url and value is around 0.5 MB data. Application requirement is to write and read about 10-20K key-values at a time which are coming from a file. What could be correct schema.? If there is no cluster key every partition will have only 1 Row. will this be ok for reading 20K records in unlogged batch?

Cassandra is really good a ingesting data quickly, and reading by a specific key. It's definitely not the right tool for reading 20K K/Vs at once. With each pair having its own partition key, your pairs will be spread across all of your nodes, and attempts to read many of them will almost assuredly result in timeouts. — Aaron, Aug 10 '18 at 13:01
than how should I design the schema as per my use case? I was thinking to generate a hash code from url as the partition key, so that each partition will have multiple records, later while fatching I can use in clause. — vermaji, Aug 11 '18 at 06:34
There isn't a good Cassandra data model that's going to work for this. A K/V based model can work, until you start talking about reading 20k values at a time. Even Redis doesn't work well with that. Also, BATCH is a misnomer; it should really be named ATOMIC, because it's meant for applying writes atomically to 5 or 6 different tables. You gain **NOTHING** by using a batch query to read, except the possibility of crashing whichever node gets chosen as the coordinator. Seriously, Postgres is probably a better fit for this case than Cassandra. — Aaron, Aug 11 '18 at 13:54

score 1 · Answer 1 · answered Aug 10 '18 at 11:19

That's very bad idea to use unlogged batch for this. Batching in Cassandra is useful only in limited set of cases.

In your case the most effective way will be to send individual queries via executeAsync and then collect this data in your application. But you may need to control how many in-flight requests are at the same time, and maybe tune connection pooling.

what is right practice to use Cassandra table as key-value Pair. And to perform bulk read and write on this Table

1 Answers1