What is the correct way to export/import data using cassandra-loader/cassandra-unloader for YugaByte DB on a table with JSONB column(s)

Question

I tried to use the steps described here https://docs.yugabyte.com/v1.1/manage/data-migration/cassandra/bulk-export/

wget https://github.com/YugaByte/cassandra-loader/releases/download/v0.0.27-yb-2/cassandra-loader wget https://github.com/YugaByte/cassandra-loader/releases/download/v0.0.27-yb-2/cassandra-unloader chmod a+x cassandra-unloader chmod a+x cassandra-loader

Since above tools are JVM based, installed open jdk
sudo yum install java-1.8.0-openjdk

Then exported the rows using:

% cd /home/yugabyte/entity % ./cassandra-unloader -schema "my_ksp.my_table(id,type,details)" -host <tserver-ip> -f export.csv -numThreads 3 Total rows retrieved: 10000

Here details is a JSONB column. Next, I create a new table my_table_new in the same cluster, and try to load this data into

./cassandra-loader -schema "my_ksp.my_table_new(id,type,details)" -host <tserver-ip> -f /home/yugabyte/entity -numThreads 3 -progressRate 200000 -numFutures 256 -rate 5000 -queryTimeout 65

But get errors of the form:
Row has different number of fields (12) than expected (3)

It looks like the default delimiter “,” in the CSV file is causing the issue, since the JSONB data in the CSV file also has commas.

As an alternative tried passing -delim “\t” to cassandra-unloader-- but that seems to insert two characters “\” and “t” and not the single-tab character. Is that expected?

score 1 · Answer 1 · answered Sep 08 '19 at 05:50

You are correct that, with cassandra-unloader/cassandra-loader, the default delimiter (",") doesn't work in the presence of YCQL JSONB columns in Yugabyte DB.

Regarding:

<< As an alternative tried passing -delim “\t” to cassandra-unloader-- but that seems to insert two characters “\” and “t” and not the single-tab character. Is that expected? >>

Using tab as the delimiter character should work right. But the unix shell needs some escaping to pass "\t" correctly to the program. Please see: https://superuser.com/questions/362235/how-do-i-enter-a-literal-tab-character-in-a-bash-shell

Use: -delim $'\t' instead of -delim "\t"

So for example for the export, try:

./cassandra-unloader -schema "my_ksp.my_table(id,type,details)" -host <tserver-ip> -f export.csv -numThreads 3 -delim $'\t'

and for the import, try:

./cassandra-loader -schema "my_ksp.my_table_new(id,type,details)" -host <tserver-ip> -f /home/yugabyte/entity -numThreads 3 -progressRate 200000 -numFutures 256 -rate 5000 -queryTimeout 65 -delim $'\t'

What is the correct way to export/import data using cassandra-loader/cassandra-unloader for YugaByte DB on a table with JSONB column(s)

1 Answers1