1

I'm using Aws Keyspaces with c# and first i do select from table to delete by partition keys and then trying delete many rows from table with where clause:

 var daysToDelete = DateTimeOffset.UtcNow.AddDays(-1);
     foreach (var result in selectResult)
                {
                    Cql deleteQuery = new Cql("WHERE interfaceid = ? and environment = ? and transactionguid < ?", 
                    result .InterfaceId,
                    result .Environment,
                    TimeUuid.Min(daysToDelete)).WithOptions(o => o.SetPageSize(100));
                    mapper.Delete<Transaction>(deleteQuery);
                }

It's about 3k - 6k rows and while i'm trying to delete these rows i get an error:

"Range delete requests are limited in the amount of items that can be deleted in a single range"

How can i solve this iisue?

Taifunov
  • 533
  • 1
  • 4
  • 9

2 Answers2

1

It's a known limitation of AWS Keyspaces. One possible solution could be:

  • do select transactionguid WHERE interfaceid = ? and environment = ? and transactionguid < ?
  • Iterate over results remembering the transactionguid when you cross boundary of thousand, two thousand, etc.
  • Iterate over remembered boundaries performing delete

P.S. Why not use more compatible cloud Cassandra, like, DataStax Astra, or something like that? AWS Keyspaces isn't true Cassandra, so you always will need to handle limitations yourself.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
1

With Amazon Keyspaces, you can delete up to 1,000 rows within a range in a single operation. To delete more than 1,000 rows within a single partition its best to break up the operation by smaller ranges or iterate over primary keys.

  • First try to delete by partition - if the majority of partitions are under 1,000 rows it would be best to attempt to delete by partition. If partitions contain more than 1,000 rows, attempt to delete by clustering column.
  • Delete by clustering column – if your model contains multiple clustering columns you can use the column hierarchy to delete multiple rows. Clustering columns are a nested structure, you can delete many items by operating against the top-level column.
  • Delete by individual row – iterate through the items and delete each row by its full primary key (partition columns and clustering columns).
  • Also consider splitting your rows over many partitions. In NoSQL, its best to distribute your throughput across table partitions. This distributes data and access evenly across physical resources providing best throughput.

Also take in consideration the following about delete heavy workloads.

  • With Amazon Keyspaces, CQL partitions can contain a virtually unbounded number of rows. This allows you to scale partitions “wider” than the traditional Cassandra guidance of 100 MB. It’s not uncommon for timeseries or ledges to grow over GB of data over time.
  • With Amazon Keyspaces, there are no compaction strategies or tombstones to consider when you have delete heavy workloads. You can delete as much as you like without impacting read performance.
MikeJPR
  • 764
  • 3
  • 14