2

I can use this command to export all the data to csv:

COPY data  TO '/usr/local/cassandra/my_data.csv';

But now I want to export the data of the specified time to csv, such as this condition:

select * from data where upload_time >='2020-10-16 00:00:00.000+00' and upload_time <'2020-10-17 00:00:00.000+0000' allow filtering ;

What should I do with command?

  • Does this answer your question? [Cassandra selective copy](https://stackoverflow.com/questions/18452158/cassandra-selective-copy) – sazzad Nov 03 '20 at 06:25
  • Hi,I saw this question,but there is something wrong when I do it with echo. –  Nov 03 '20 at 06:39

1 Answers1

2

You can use DSBulk utility with custom query, but you need to be careful and put the optimized condition so it will perform full scan, but using the token ranges (see this blog post for details).

Something like (replace pk with the name of the actual partition key column, and unsplit query string - I split it just for readability):

dsbulk unload -url data.csv \
-query "SELECT * FROM ks.table WHERE token(pk) > :start AND token(pk) <= :end 
  AND upload_time >='2020-01-01 00:00:00.000+00' 
  AND upload_time <'2021-01-01 00:00:00.000+0000' allow filtering" 

Another way is to use Spark with Spark Cassandra Connector (even in the local master mode) - it will do the same under the hood, something like this (example for spark-shell in Scala, could be done similarly via pyspark):

import org.apache.spark.sql.cassandra._
val data = spark.read.cassandraFormat("table", "keyspace").load()
val filtered = data.filter("upload_time >= cast('2020-01-01 00:00:00.000+00' as timestamp) AND upload_time <= cast('2021-01-01 00:00:00.000+0000' as timestamp)")
filtered.write.format("csv").save("data.csv")
Alex Ott
  • 80,552
  • 8
  • 87
  • 132