Is there a way to get a diff of data stored in 2 column families in Cassandra?

Question

The use case is that we are migrating data from one column family to the other and hence it would be required to verify that the target column family has exactly the same data as the source column family. That means the diff of these CFs would be empty i.e. no difference. So, is there a way to achieve such a diff?

score 1 · Answer 1 · answered Aug 27 '15 at 00:14

If your table is not too gigantic, you could export the table contents to a csv file for both tables, sort the csv files, then do a diff of the sorted files.

You can specify the columns you are interested in and the order you want them in the csv files with the copy command, for example:

cqlsh> COPY table1 (old_col1, old_col2, old_col3) TO 'table1.csv';
cqlsh> COPY table2 (new_col1, new_col2, new_col3) TO 'table2.csv';

diff <(sort table1.csv) <(sort table2.csv)

If the table is gigantic and both tables will be in Cassandra at the same time, you could write an application that pages through the first table and for each row, reads the corresponding key in the second table and compares them. And then repeat that by paging through the second table and reading the corresponding key in the first table. That of course would be more work to write such an application.

Was looking forward to any utility/third party tools that might be available for doing so like there are for SQL tables like sqldiff. Definitely had thought of the above solution but yes and as also mentioned isn't of much help in case of huge data. — schatter, Aug 27 '15 at 17:58
I haven't seen any third party tools for that, but maybe someone will chime in with one. Otherwise you may be on your own to create something. — Jim Meyer, Aug 27 '15 at 18:07

score 0 · Accepted Answer · answered Sep 02 '15 at 17:06

0

I would consider using the sstable2json utility to export each to disk, and then a standard linux diff command on the exported json of both tables.

Documentation of sstable2json: http://docs.datastax.com/en/cassandra/1.2/cassandra/tools/toolsSStable2json_t.html

answered Sep 02 '15 at 17:06

Chris Gerlt

647
4
10

This would certainly be useful. Thanks Chris. – schatter Sep 03 '15 at 23:50

Is there a way to get a diff of data stored in 2 column families in Cassandra?

2 Answers2