1

There are a way how to set TTL for existing raws in table without re-inserting all data?

All documentation talks about examples when inserting record using custom/default TTL. https://docs.aws.amazon.com/keyspaces/latest/devguide/TTL-how-to.html

ALTER TABLE "my_keyspace"."my_table" WITH default_time_to_live = 31536000  ;

Sets default TTL for new records.

Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23
Mindaugas K.
  • 433
  • 5
  • 12

3 Answers3

2

You can't change the TTL for existing records without reinserting, but you can use a tool like DSBulk to accomplish the task of unloading, then loading while setting a ttl, there is a load example that utilizes "USING TTL" here: https://docs.datastax.com/en/dsbulk/docs/reference/dsbulkLoad.html

Setting a default TTL for each heavily used table is a good idea, then you can always overwrite the default TTL when inserting data, for example:

INSERT INTO keyspace.table (col1, col2, col3) VALUES ('coltext1', 'coltext2', 'coltext3') USING TTL 864000;
Paul
  • 351
  • 1
  • 5
  • If you need to preserve the original writetime when reinserting, DSBulk allows that as well and you could set a TTL at the same time, see the article here: https://support.datastax.com/s/article/DSBULK-Unload-and-Load-with-Original-Write-Time-and-TTL-Data – Paul Nov 18 '22 at 14:47
  • Thank you for your answer... However, again, the problem is dsbulk tool is not working with AWS keyspace com.amazonaws.cassandra.DefaultPartitioner , error: https://stackoverflow.com/questions/74158451/aws-keyspace-dsbulk-unload-failed-token-metadata-not-present/74163262?noredirect=1#comment130965705_74163262 – Mindaugas K. Nov 21 '22 at 10:21
  • I see, yes, if you use the aws default partitioner, then you can't use DSBulk. You'd have to write a program to reinsert the rows. If you need to preserve the writetime, you can still do that in the Insert statement with 'USING TIMESTAMP' alongside setting the new TTL. Hopefully DSBulk will support this in the near future. – Paul Nov 21 '22 at 17:26
1

There is no way to update the TTL on existing data without re-inserting it with a new TTL.

The normal way of handling this is by developing an ETL app usually with Apache Spark. With the Spark Cassandra connector, you will need to write an app that iterates over the partitions in the table that retrieves both the cell values AND WRITETIME(), then re-insert the data with a new TTL using the same WRITETIME() timestamp.

Note however that the Spark connector only works on clusters which use the Murmur3Partitioner and RandomPartitioner (see supported spark-cassandra-connector partitioners here). The Spark connector does not work with Keyspaces' com.amazonaws.cassandra.DefaultPartitioner so you will need to make sure that the default partitioner for your account is supported. See Configuring Amazon Keyspaces for integration with the spark-cassandra-connector for details.

If your Amazon Keyspaces DB is unsupported, your only option is to write your own app that will rewrite the data with a new TTL. Cheers!

Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23
0

What you will want to do is use AWS Glue with the Spark Cassandra connector. This will be fully serverless end-to-end and will scale with your table size. Spark Cassandra Connector and Murmur3 partitioner is compatible with Amazon Keyspaces. See this repo for examples of exports and imports

val myTable = sparkSession.read
      .format("org.apache.spark.sql.cassandra")
      .options(Map( "table" -> tableName, "keyspace" -> keyspaceName))
      .load()

   //Try first without shuffling step. If you see WriteThottleEvents then reading by partition and writing by partition maybe causing hotkeys. Can happen with wider partitions. 
   //Randomize data will avoid WriteThottleEvents
   //The following command will randomize the data.
   
   //val shuffledData = myTable.orderBy(rand())

   myTable.write.format("org.apache.spark.sql.cassandra").mode("append").option("keyspace", keyspaceName).option("table", tableName).option("ttl",999999).save()
MikeJPR
  • 764
  • 3
  • 14