Questions tagged [dsbulk]

DataStax Bulk Loader (DSBulk) is an open-source tool for loading into and unloading from Apache Cassandra®, DataStax Astra and DataStax Enterprise (DSE).

The DataStax Bulk Loader tool (DSBulk) is a unified tool for loading into and unloading from Cassandra-compatible storage engines, such as OSS Apache Cassandra®, DataStax Astra and DataStax Enterprise (DSE).

Out of the box, DSBulk provides the ability to:

  • Load (import) large amounts of data into the database efficiently and reliably;
  • Unload (export) large amounts of data from the database efficiently and reliably;
  • Count elements in a database table: how many rows in total, how many rows per replica and per token range, and how many rows in the top N largest partitions.
  • Currently, CSV and Json formats are supported for both loading and unloading data.

GitHub: https://github.com/datastax/dsbulk

41 questions
0
votes
2 answers

dsbulk to load in batches and improved throughput

I am running dsbulk to load CSV into cassandra. I tried with a csv that has 2 million records and dsbulk took almost 1 hr 6 mins to load the file into DB. total | failed | rows/s | p50ms | p99ms | p999ms | batches 2,000,000 | 0 | 500 |…
0
votes
2 answers

DSBulk cannot connect to cluster to load CSV data

I am trying to load csv files into cassandra cluster for which I am using dsbulk utility.I have a local copy of CSV file and trying to connect to remote cluster and load the CSV into the table. However, dsbulk is failing to recognise remote cluster…
0
votes
2 answers

Does DSBulk with maxErrors=0 retry failed queries?

I'm using dsbulk to load data into Cassandra cluster. Configuration currently includes -maxErrors 0 to fail fast in case of any issue. It's not clear for me how retry strategy defined by advanced.retry-policy.class = …
Ihar
  • 1
0
votes
2 answers

DSBulk CSV Load Failure to DataStax Astra Cassandra Database, missing file config.json

I am trying to load a csv into a database in DataStax Astra using the DSBulk tool. Here is the command I ran minus the sensitive details: dsbulk load -url D:\\App\\data.csv -k data -t data -b D:\\App\\secure-connect-myapp -u username -p…
0
votes
1 answer

Does Datastax dsbulk tool duplicates or upsert data when previously loaded file reloaded?

Does Datastax dsbulk duplicates or upsert data when previously loaded file reloaded?
0
votes
1 answer

Location of driver.conf used for DSBULK to load data into Cassandra

I am using a configuration file as below to load data in Cassandra using DSBULK include…
Rajib Deb
  • 1,496
  • 11
  • 30
0
votes
1 answer

Using DSBulk for backup/restore takes too long

I use dsbulk for text based backup and restore of cassandra cluster. I have created a python script that backsup/restores the all the tables in cassandra cluster using dsbulk load/unload but it takes long time even for less data due to new session…
0
votes
1 answer

Datastax Bulk Loader can't find my SSL certificate

On my windows machine I have CQLSH working and using a .cert file Now I am starting to use DSBulk, but can't get the command line to know where to find my certificate. I have a cert file here: C:\myfolder\mycert.cer Here is a sample of my command…
0
votes
1 answer

DataStax DSBulk - Difference between query / table unload

I'm using dsbulk to try to extract some data from our cassandra cluster, and seeing some odd behavior. Trying to understand if this is expected. If I perform an unload by specifying tablespace and table, I'm seeing different (less) results than if…
Mike Whitis
  • 126
  • 2
  • 8
0
votes
1 answer

How to install dsbulk on mac?

I have been following the official documentation for installing dsbulk loader but in vain. In the documentation, it says download and install but all that is instructed is to download and extract the zip file. However, typing dsbulk in any directory…
aviral sanjay
  • 953
  • 2
  • 14
  • 31
0
votes
1 answer

Cassandra bulk load dsbulk - set load issue

Trying to load a csv file into dse cassandra using the dsbulk utility. I am running into issues if the column is defined as set. copy command is successfully loading "{'bible', 'moses', 'ramses'}" & "{'televison'}" . But, dsbulk fails when there are…
Prak_Rum
  • 25
  • 1
  • 11
1 2
3