I want to create a key-value datastore where key is a url and value is around 0.5 MB data. Application requirement is to write and read about 10-20K key-values at a time which are coming from a file. What could be correct schema.? If there is no cluster key every partition will have only 1 Row. will this be ok for reading 20K records in unlogged batch?
Asked
Active
Viewed 128 times
2
-
Cassandra is really good a ingesting data quickly, and reading by a specific key. It's definitely not the right tool for reading 20K K/Vs at once. With each pair having its own partition key, your pairs will be spread across all of your nodes, and attempts to read many of them will almost assuredly result in timeouts. – Aaron Aug 10 '18 at 13:01
-
than how should I design the schema as per my use case? I was thinking to generate a hash code from url as the partition key, so that each partition will have multiple records, later while fatching I can use in clause. – vermaji Aug 11 '18 at 06:34
-
There isn't a good Cassandra data model that's going to work for this. A K/V based model can work, until you start talking about reading 20k values at a time. Even Redis doesn't work well with that. Also, BATCH is a misnomer; it should really be named ATOMIC, because it's meant for applying writes atomically to 5 or 6 different tables. You gain **NOTHING** by using a batch query to read, except the possibility of crashing whichever node gets chosen as the coordinator. Seriously, Postgres is probably a better fit for this case than Cassandra. – Aaron Aug 11 '18 at 13:54
1 Answers
1
That's very bad idea to use unlogged batch for this. Batching in Cassandra is useful only in limited set of cases.
In your case the most effective way will be to send individual queries via executeAsync
and then collect this data in your application. But you may need to control how many in-flight requests are at the same time, and maybe tune connection pooling.

Alex Ott
- 80,552
- 8
- 87
- 132