Questions tagged [dsbulk]

DataStax Bulk Loader (DSBulk) is an open-source tool for loading into and unloading from Apache Cassandra®, DataStax Astra and DataStax Enterprise (DSE).

The DataStax Bulk Loader tool (DSBulk) is a unified tool for loading into and unloading from Cassandra-compatible storage engines, such as OSS Apache Cassandra®, DataStax Astra and DataStax Enterprise (DSE).

Out of the box, DSBulk provides the ability to:

Load (import) large amounts of data into the database efficiently and reliably;
Unload (export) large amounts of data from the database efficiently and reliably;
Count elements in a database table: how many rows in total, how many rows per replica and per token range, and how many rows in the top N largest partitions.
Currently, CSV and Json formats are supported for both loading and unloading data.

GitHub: https://github.com/datastax/dsbulk

41 questions

votes

1 answer

Is it possible to backup and restore Cassandra cluster using dsbulk?

I searched through the internet a lot and saw a lot of ways to backup and restore a Cassandra cluster, such as nodetool snapshot and Medusa. but my question is that can I use dsbulk to backup a Cassandra cluster. What are its limitations? Why…

asked Sep 28 '21 at 15:35

Mostafa Bayat

votes

0 answers

Loading json data into Cassandra using dsbulk

I feel like the documentation on loading json files into cassandra is really lacking in dsbulk docs. Here is part of the json file that im trying to load: [ { "tags": [ "r" ], "owner": { "reputation": 23, "user_id":…

cassandra datastax dsbulk

asked May 16 '20 at 16:35

Itzblend

votes

2 answers

issue while loading data in cassandra using dsbulk

I’m facing issue while loading data into table from .csv file using dsbulk. I get like below in the errorlog. Caused by: com.datastax.driver.core.exceptions.OperationTimedOutException: [/10.0.126.13:9042] Timed out waiting for server response This…

cassandra datastax-enterprise dsbulk

asked Mar 21 '19 at 21:48

v parkar

votes

1 answer

DSBulk loader version 1.8 : error in loading and connecting to Apache Cassandra

I installed Apache Cassandra and DSBulk loader as manual and all it's ok, but when I try to load data with DSBulk it seems to be a problem of connession between the db and the DSBulk. Someone can say to me what happened and how can I solve? That's…

ubuntu cassandra datastax-enterprise cassandra-3.0 dsbulk

asked May 05 '21 at 09:30

TforV

votes

2 answers

How to load data into Apache Cassandra with Datastax Bulk loader (Ubuntu)?

When I want to upload data to my "Test Cluster" into Apache Cassandra I open the terminal and then: export PATH=/home/mypc/dsbulk-1.7.0/bin:$PATH source ~/.bashrc dsbulk load -url /home/mypc/Desktop/test/file.csv -k keyspace_test -t…

ubuntu cassandra datastax bulkloader dsbulk

asked Nov 05 '20 at 10:49

wundolab

votes

1 answer

Use dsbulk load in python

I created a Cassandra database in DataStax Astra. I'm able to connect to it in Python (using cassandra-driver module, and the secure_connect_bundle). I wrote a few api in my Python application to query the database. I read that I can upload csv to…

python cassandra dsbulk datastax-astra

asked Aug 18 '20 at 20:17

F.S.

1,175
2
14
34

votes

1 answer

DSBulk with ScyllaDB

I'm trying to use DSBulk to load data into ScyllaDB. I know officially DSBulk doesn't support Scylla, but I found a post of someone using it instead of cqlsh. When I'm trying to connect, I'm always getting this error init query OPTIONS: error…

cassandra cqlsh scylla dsbulk

asked Feb 26 '20 at 15:20

JonathanChaput

vote

3 answers

Why do row counts per node differ for a 5-node cluster with a replication factor of 3?

I have 5 nodes of machines connected in a Cassandra distributed data system. I have setup the replication factor as 3. I have understood that for a replication of 3, the data will be spread across 3 nodes based on the coordinator nodes availability.…

cassandra replication dsbulk

asked Aug 19 '23 at 21:33

prasanna

vote

2 answers

Is a CQL COUNT() on a single partition also an expensive operation?

I know Cassandra count() is an expensive operation as it needs a complete table scan. https://www.datastax.com/blog/running-count-expensive-cassandra But let's say, we have a table hotel with hotel_type as the partition key and we run query select…

count cassandra dsbulk

asked Mar 25 '23 at 11:25

Purushottam Baghel

vote

1 answer

What is the correct CSV format for tuples when loading data with DSBulk?

I recently started using Cassandra for my new project and doing some load testing. I have a scenario where I’m doing dsbulk load using CSV like this, $ dsbulk load -url -k -t -h -u -p -header…

cassandra tuples cqlsh dsbulk

asked Jan 14 '23 at 19:33

Senthil

vote

1 answer

cassandra dsbulk mapping failed

I am using dsbulk to load dataset into the datastax astra error message: my table structure: CREATE TABLE project( FL_DATE date, OP_CARRIER text, DEP_DELAY float, ARR_DELAY float, PRIMARY KEY ((FL_DATE), OP_CARRIER) ) WITH CLUSTERING ORDER…

cassandra datastax-astra dsbulk

asked Dec 26 '22 at 14:38

james kam

vote

2 answers

AWS Keyspace DSBulk unload failed, "Token metadata not present"

Getting error when trying to unload or count data from AWS Keyspace using dsbulk. Error: Operation COUNT_20221021-192729-813222 failed: Token metadata not present. Command line: $ dsbulk count/unload -k my_best_storage -t book_awards -f…

amazon-web-services cassandra amazon-keyspaces dsbulk

asked Oct 21 '22 at 19:35

Mindaugas K

vote

1 answer

Using DSBulk to load into a CQL set returns "Invalid set literal - bind variables are not supported inside collection literals"

I try to load with dsbulk a huge amount of data into a table wit a set using: dsbulk load test.json \ -h cassandra-db -u ... -p ... -k mykeyspace \ -query "update mykeyspace.mytable set value_s = value_s +{:value_s} where value_1=:value_1 and…

cassandra dsbulk

asked Jun 29 '22 at 12:29

dank

vote

1 answer

How do I limit the files generated by DSBulk UNLOAD to just one CSV file?

I have run below command in EC2 instance to unload data from cassandra and store it at some place in EC2, But I observing that for each dsbulk unload command it generates 2 json files irrespective of how large or small the file size is. How do I…

cassandra datastax dsbulk

asked Jun 23 '22 at 03:32

Rahul Diggi

vote

2 answers

How to include SQL select statement in dsbulk unload command

I have a huge orderhistory table in cassandra having data from 2013, But I want only last 12 months of orderhistory data to be unloaded, I use the below command to do it which unloads all the data starting from 2013 and stores in the path…

cassandra datastax dsbulk

asked Jun 21 '22 at 07:29

Rahul Diggi

2 3 Next