2

I have one API, is an flask application with python deployed on AWS EC2. Some endpoints need to connect on AWS Keyspace for make a query. But the method cluster.connect() is too slow, takes 5 seconds for connect and then run the query.

What I did to solve it, was to start a connection when the application starts (when a commit is done on the master branch, I'm using CodePipeline), and then the connection is open all the time.

I didn't find anything in the python cassandra driver documentation against this, is there any potential problem with this solution that I found?

2 Answers2

3

Could you provide the current connection configuration?

Amazon Keyspaces uses Transport Layer Security (TLS) communication by default. If your not providing the cert on connection, adding it could help speed things up. For a complete example check out Keyspaces Python Sample

You can also try disabling the following options which should provide quicker times for initial connection.

schema_metadata_enabled = False
token_metadata_enabled = False 

Python Driver Documentation

    from cassandra.cluster import Cluster
    from ssl import SSLContext, PROTOCOL_TLSv1_2 , CERT_REQUIRED
    from cassandra.auth import PlainTextAuthProvider
    import boto3
    from cassandra_sigv4.auth import SigV4AuthProvider
    
    ssl_context = SSLContext(PROTOCOL_TLSv1_2)
    ssl_context.load_verify_locations('path_to_file/sf-class2-root.crt')
    ssl_context.verify_mode = CERT_REQUIRED
    
    boto_session = boto3.Session()
    auth_provider = SigV4AuthProvider(boto_session)
    
    cluster = Cluster(['cassandra.us-east-2.amazonaws.com'], ssl_context=ssl_context, auth_provider=auth_provider,
                      port=9142)

    cluster.schema_metadata_enabled = False
    cluster.token_metadata_enabled = False 
    
    session = cluster.connect()
    r = session.execute('select * from system_schema.keyspaces')
    print(r.current_rows)
MikeJPR
  • 764
  • 3
  • 14
  • Disabling metadatas did not have any effect for me. It still takes 5.9s to connect to a local cassandra instance! – Esmailian Aug 01 '21 at 01:17
  • @Esmailian Local cassandra instance should be pretty quick to connect to. I suspect you are having some memory issues if your cassandra node is running on a local host. The suggestion above was for Amazon Keyspaces, a serverless nosql database service. https://aws.amazon.com/keyspaces/ – MikeJPR Aug 09 '21 at 22:34
2

It's a recommended way - open connection at start and keep it (and have one connection per application). Opening connection to a Cassandra cluster is an expensive operation, because besides connection itself, driver discovers the topology of the cluster, calculate token ranges, and many other things. Usually, for "normal" Cassandra this shouldn't be very long (but still expensive), and AWS's emulation may add an additional latency on top of it.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132