I created a Cassandra database in DataStax Astra. I'm able to connect to it in Python (using cassandra-driver
module, and the secure_connect_bundle
). I wrote a few api in my Python application to query the database.
I read that I can upload csv to it using dsbulk
. I am able to run the following command in Terminal and it works.
dsbulk load -url data.csv -k foo_keyspace -t foo_table \
-b "secure-connect-afterpay.zip" -u username -p password -header true
Then I try to run this same line in Python using subprocess
:
ret = subprocess.run(
['dsbulk', 'load', '-url', 'data.csv', '-k', 'foo_keyspace', '-t', 'foo_table',
'-b', 'secure-connect-afterpay.zip', '-u', 'username', '-p', 'password',
'-header', 'true'],
capture_output=True
)
But I got FileNotFoundError: [Errno 2] No such file or directory: 'dsbulk': 'dsbulk'
. Why is dsbulk
not recognized if I run it from Python?
A related question, it's probably not best practice to rely on subprocess
. Are there better ways to upload batch data to Cassandra?