Questions tagged [dsbulk]

DataStax Bulk Loader (DSBulk) is an open-source tool for loading into and unloading from Apache Cassandra®, DataStax Astra and DataStax Enterprise (DSE).

The DataStax Bulk Loader tool (DSBulk) is a unified tool for loading into and unloading from Cassandra-compatible storage engines, such as OSS Apache Cassandra®, DataStax Astra and DataStax Enterprise (DSE).

Out of the box, DSBulk provides the ability to:

  • Load (import) large amounts of data into the database efficiently and reliably;
  • Unload (export) large amounts of data from the database efficiently and reliably;
  • Count elements in a database table: how many rows in total, how many rows per replica and per token range, and how many rows in the top N largest partitions.
  • Currently, CSV and Json formats are supported for both loading and unloading data.

GitHub: https://github.com/datastax/dsbulk

41 questions
1
vote
1 answer

DSBulk unloading 1TB of data from Kubernetes DSE Cluster fails

I am using DSBulk to unload data into CSV from a DSE cluster installed under Kubernetes, My cluster consists of 9 Kubernetes Pods each with 120 GB Ram. I have monitored the resources while unloading the data and observed that the more the data is…
1
vote
1 answer

Exporting Cassandra table with DataStax Bulk Loader v1.8 complains about connection pool exhaustion

I run it with these settings: dsbulk unload -k keyspace -t table --connector.csv.delimiter "^" --engine.maxConcurrentQueries=4 --connector.csv.url ... application complains about connection pool exhaustion --> application gets timeouts on…
Dennis
  • 23
  • 4
1
vote
0 answers

DSBULK is showing authorization error, however the load is completing without any failed erros

I am executing DSBULK, it actually runs and completes. In the output I do not see any failed records, but I see the below error message. Is this because the permissions_validity_in_ms is set to 2000 ms. Should it be…
Rajib Deb
  • 1,496
  • 11
  • 30
1
vote
1 answer

I am getting a heap memory issue while running DSBULK load

I have unloaded more than 100 CSV files in a folder. When I try to load those files to cassandra using DSBULK load and specifying the the folder location of all these files, I get the below error Exception: java.lang.OutOfMemoryError thrown from the…
Rajib Deb
  • 1,496
  • 11
  • 30
1
vote
2 answers

dsbulk unload is failing on large table

trying to unload data from a huge table, below is the command used and output. $ /home/cassandra/dsbulk-1.8.0/bin/dsbulk unload --driver.auth.provider PlainTextAuthProvider --driver.auth.username xxxx --driver.auth.password xxxx…
nmakb
  • 1,069
  • 1
  • 17
  • 35
1
vote
0 answers

Why might DSBulk Load stop operation without any errors?

I have created a Cassandra database in DataStax Astra and am trying to load a CSV file using DSBulk in Windows. However, when I run the dsbulk load command, the operation never completes or fails. I receive no error message at all, and I have to…
1
vote
0 answers

How to batch Cassandra dsbulk loading version 1.7

I'm trying to load a large CSV (30 GB) file into my cluster. I'm realizing that I might be overloading my Cassandra driver which is causing it to crash at some point during loading. I am getting a repeated message while it loads the data, until a…
Epsilon_Delta
  • 135
  • 1
  • 6
1
vote
1 answer

dsbulk unload missing data

I'm using dsbulk 1.6.0 to unload data from cassandra 3.11.3. Each unload results in wildly different counts of rows. Here are results from 3 invocations of unload, on the same cluster, connecting to the same cassandra host. The table being unloaded…
Tim
  • 4,560
  • 2
  • 40
  • 64
1
vote
1 answer

How do I run dsbulk unload and write directly to S3

I want to run a dsbulk unload command, but my cassandra cluster has ~1tb of data in the table I want to export. Is there a way to run the dsbulk unload command and stream the data into s3 as opposed to writing to disk? Im running the following…
Wonger
  • 285
  • 6
  • 18
1
vote
1 answer

DataStax Bulk Loader for Apache Cassandra isn't installing on Windows

I'm trying to install DataStax Bulk Loader on my Windows machine in order to import json file to Cassandra databse. I just follow the installation instructions from the official webstie. It's just unpack the folder. Printing dsbulkfrom any catalogue…
GecKo
  • 143
  • 1
  • 11
1
vote
1 answer

Datastax Bulk Loader for Apache Cassandra not installing

I have followed the instructions in the documentation: https://docs.datastax.com/en/dsbulk/doc/dsbulk/install/dsbulkInstall.html However, after doing the following: curl -OL https://downloads.datastax.com/dsbulk/dsbulk-1.6.0.tar.gz and tar -xzvf…
joecode
  • 25
  • 4
1
vote
1 answer

How to import data into Cassandra on EC2 using DSBulk Loader

I'm attempting to import data into Cassandra on EC2 using dsbulk loader. I have three nodes configured and communicating as follows: UN 172.31.37.60 247.91 KiB 256 35.9% 7fdfe44d-ce42-45c5-bb6b-c3e8377b0eba 2a UN …
tpooch21
  • 59
  • 1
  • 5
1
vote
2 answers

First steps on loading data into Cassandra with dsbulk

I am following this guide on setting up dsbulk: https://docs.datastax.com/en/dsbulk/doc/dsbulk/dsbulkSimpleLoad.html I'm getting confused at this part: dsbulk load -url export.csv -k ks1 -t table1 \ -b "path/to/secure-connect-database_name.zip"…
Itzblend
  • 107
  • 2
  • 9
1
vote
1 answer

Issue with dsbulk unload

I am getting the below messages while unloading using dsbulk. I am not able to figure out what this means [s0|347101951|0] Error sending cancel request. This is not critical (the request will eventually time out server-side). (HeartbeatException:…
Rajib Deb
  • 1,496
  • 11
  • 30
0
votes
3 answers

How can I scan the entire cassandra table which has 10B entries and no indexing?

I have a cassandra database which contains a table over 10B entries with no indexes. I need to get every row and do some data grouping. However I did using java & spring boot framework and it only scanned 2B records which is the cassandra limit on…
Anish
  • 79
  • 1
  • 1
  • 5