1

In my Python (3.8) application, I make a request to the Cassandra database via DataStax Python Driver 3.24.

I have several CQL operations that I am trying to execute with a single query via BatchStatement according to the official documentation. Unfortunately, my code causes an error with the following content:

"errorMessage": "retry_policy should implement cassandra.policies.RetryPolicy"
"errorType": "ValueError"

As you can see from my code I set the value for the reply_policy attribute inside BatchStatement. Anyway my code raise error which you see above. What kind of value must be inside the reply_policy property? What is the reason for the current conflict?

Code Snippet:

from cassandra.cluster import Cluster, ExecutionProfile, EXEC_PROFILE_DEFAULT
from cassandra.auth import PlainTextAuthProvider
from cassandra.policies import DCAwareRoundRobinPolicy
from cassandra import ConsistencyLevel
from cassandra.query import dict_factory
from cassandra.query import BatchStatement, SimpleStatement
from cassandra.policies import RetryPolicy


auth_provider = PlainTextAuthProvider(username=db_username, password=db_password)
default_profile = ExecutionProfile(
   load_balancing_policy=DCAwareRoundRobinPolicy(local_dc=db_local_dc),
   consistency_level=ConsistencyLevel.LOCAL_QUORUM,
   request_timeout=60,
   row_factory=dict_factory
)
cluster = Cluster(
   db_host,
   auth_provider=auth_provider,
   port=db_port,
   protocol_version=4,
   connect_timeout=60,
   idle_heartbeat_interval=0,
   execution_profiles={EXEC_PROFILE_DEFAULT: default_profile}
)
session = cluster.connect()

name_1, name_2, name_3  = "Bob", "Jack", "Alex"
age_1, age_2, age_3 = 25, 30, 18

cql_statement = "INSERT INTO users (name, age) VALUES (%s, %s)"

batch = BatchStatement(retry_policy=RetryPolicy)
batch.add(SimpleStatement(cql_statement, (name_1, age_1)))
batch.add(SimpleStatement(cql_statement, (name_2, age_2)))
batch.add(SimpleStatement(cql_statement, (name_3, age_3)))
session.execute(batch)
Nurzhan Nogerbek
  • 4,806
  • 16
  • 87
  • 193
  • why do you need that batch? if you just want to re-insert entry, just do insert - it will overwrite existing data because in Cassandra everything is UPSERT... Also, why do you need retry policy? just leave it & the default one will be used – Alex Ott Dec 13 '20 at 10:16
  • @AlexOtt hello! As I said in my post, I have several CQL operations that I am trying to execute with a single query. I understand that each of the requests can be executed separately. Suppose I need to do an insert of a set of data (CQL query is the same, arguments are different). I already tried to remove the `reply_policy` property, but the error remained the same. By default, it is set to `None`, if you look under the hood of `BatchStatement`. Do you have any ideas, my friend? – Nurzhan Nogerbek Dec 13 '20 at 10:36
  • batch of delete + insert on the same primary key is tricky, and may not behave how you think it should behave... Please expand in the post what do you want to achieve, not how do you want to achieve. Why you need to have delete + insert combination in the first place. also, adding the schema of the table will help – Alex Ott Dec 13 '20 at 10:42
  • I understand. Here in the post is just an example. I actually need to perform an `INSERT` operation with different parameters only. I updated the post. Can you please check it out? – Nurzhan Nogerbek Dec 13 '20 at 10:53

1 Answers1

1

Well, I finally found the error.

I removed the retry_policy property from the BatchStatement. Then my mistake was that I put CQL arguments inside SimpleStatement.

Here is working example code snippet:

...
batch = BatchStatement(batch_type=BatchType.UNLOGGED)
batch.add(SimpleStatement(cql_statement), (name_1, age_1))
batch.add(SimpleStatement(cql_statement), (name_2, age_2))
batch.add(SimpleStatement(cql_statement), (name_3, age_3))
session.execute(batch)

EDITED:

As a result, I abandoned BatchStatement after comments left at the bottom of this post. I beg you to pay attention to them! CQL batches are not the same as RBDMS batches. CQL batches are not an optimization but for achieving atomic updates of a denormalized record across multiple tables.

Nurzhan Nogerbek
  • 4,806
  • 16
  • 87
  • 193
  • 2
    please don't do it this way - the batches in Cassandra aren't optimization for writing. Really you will make your inserts slower. Use prepared statements instead + execute_async - google for bad & good use of the batches in Cassandra – Alex Ott Dec 13 '20 at 11:44
  • Thank you very much for your response. Apparently, you say this from a personally painful experience. But shouldn't `BatchStatement` work faster than executing individual queries? Imagine you need to record 100,000 records. Brute force will take a long time. Isn't it? – Nurzhan Nogerbek Dec 13 '20 at 19:57
  • 3
    No. It won’t be faster because all load will be put onto coordinator node that will dispatch queries to nodes holding data, instead of driver sending these queries directly to that nodes. The fastest way to put a lot of data is to use prepared queries and execute_async... or use external tool like DSBulk – Alex Ott Dec 13 '20 at 20:01
  • In recommend to grab free Cassandra book from DataStax... – Alex Ott Dec 13 '20 at 20:50
  • 1
    Alex is correct. CQL batches are not the same as RBDMS batches. CQL batches are not an optimisation but for achieving atomic updates of a denormalised record across multiple tables. If it helps, I've explained them in this post -- https://community.datastax.com/articles/2744/. Cheers! – Erick Ramirez Dec 15 '20 at 05:13
  • @ErickRamirez thank you for your comment and useful article. I will update my post with your comments for future visitors. – Nurzhan Nogerbek Dec 15 '20 at 06:07
  • @NurzhanNogerbek Hi, I know we shouldn't use CQL Batches, but I have a list of contaning data, each time I have to loop through the list then add data to db so it painfully slow without batch how can I achieve the speed like when I add single row of document? – frankenstein Mar 24 '23 at 11:54