You should be able to connect using spark 3 and connector 3. Here are some steps to validate you setup connection accordingly and you have the right permissions.
- Make sure you have permissions to read the system tables.
- If you have setup the VPCE endpoint ensure you have permissions for describe VPC endpoints.
- In you configuration make sure that host-validation set to false in ssl config.
You should be able to execute the following query against your system.peers table and retrieve the ips from the endpoint public/private. If you have 1 or no peers you need to take the steps above. Remember the AWS console is not in your vpc and will contact the public endpoint similar to s3.
SELECT * FROM system.peers
Sample Policy. You need to provide access to resource /keyspace/system* and ec2:DescribeNetworkInterfaces" and "ec2:DescribeVpcEndpoints" on your vpc.
{
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Allow",
"Action":[
"cassandra:Select",
"cassandra:Modify"
],
"Resource":[
"arn:aws:cassandra:us-east-1:111122223333:/keyspace/mykeyspace/table/mytable",
"arn:aws:cassandra:us-east-1:111122223333:/keyspace/system*"
]
},
{
"Sid":"ListVPCEndpoints",
"Effect":"Allow",
"Action":[
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeVpcEndpoints"
],
"Resource":"*"
}
]
}
Setup the connection by referencing the external config.
-conf":"spark.cassandra.connection.config.profile.path=application.conf"
Sample driver config.
datastax-java-driver {
basic.request.consistency = "LOCAL_QUORUM"
basic.contact-points = [ "cassandra.us-east-1.amazonaws.com:9142"]
advanced.reconnect-on-init = true
basic.load-balancing-policy {
local-datacenter = "us-east-1"
}
advanced.auth-provider = {
class = PlainTextAuthProvider
username = "user-at-sample"
password = "S@MPLE=PASSWORD="
}
advanced.throttler = {
class = ConcurrencyLimitingRequestThrottler
max-concurrent-requests = 30
max-queue-size = 2000
}
advanced.ssl-engine-factory {
class = DefaultSslEngineFactory
hostname-validation = false
}
advanced.connection.pool.local.size = 1
}