5

We have 2 cassandra clusters, first one has the old data and second one has the new data.

Now we want to move or copy the old data from first cluster to second. What is the best way to do this and how to do this?

we are using DSE 3.1.4.

Ram
  • 324
  • 1
  • 4
  • 21
  • 1
    Do these clusters have the same keyspace configured? What is the replication? Do you want all of the data replicated between the two clusters or only to reside on the second cluster? – RussS Nov 15 '13 at 15:43
  • The name of the keysapce has to be changed and the 1st cluster has one primary key and second cluster which we need to copy data has composite primary key. – Ram Nov 19 '13 at 02:48

2 Answers2

5

One tool you could try would be the COPY TO/FROM cqlsh command.

On a node in the old cluster, you would use the COPY TO:

cqlsh> COPY myTable (col1, col2, col3, col4) TO 'temp.csv'

And then (after copying the file over) on a node in your new cluster, you would copy the data in the CSV file into Cassandra:

cqlsh> COPY myTable (col1, col2, col3, col4) FROM 'temp.csv'

Here is some more documentation on the COPY command.

Note that the COPY TO/FROM is recommended for tables that contain only a few million rows or less. For larger datasets you should look at:

Aaron
  • 55,518
  • 11
  • 116
  • 132
  • Sorry I forgot to mention earlier, I have different schema ( Added a composite primary key ) in the new cluster. Is this will still work? – Ram Nov 18 '13 at 23:02
  • I had 350GB of data which will be faster? – Ram Nov 18 '13 at 23:03
  • If the schema is different, you can specify how you want to map each column; so that's ok. Is the 350GB all one table? If so, could try using COPY, but I think the Bulk Loader might be your best bet. – Aaron Nov 19 '13 at 12:29
  • 350GB on whole cluster. It's not in one table – Ram Nov 19 '13 at 14:03
  • If it's 350 for the whole cluster, then definitely give COPY a try. – Aaron Nov 19 '13 at 14:58
  • Well, it looks like COPY is your only option, as that should allow you to modify the column/key names. – Aaron Nov 21 '13 at 22:37
2

There is a tool called /usr/bin/sstableloader for copying data between the clusters. Although when I used it months ago, I encountered an error and used this instead. But since it was a long time ago, sstableloader might have been fixed already.

Roman Tumaykin
  • 1,921
  • 11
  • 11