I'm stumped by a problem I'm having with my multi-datacentre cassandra cluster. It's a brand new cluster of six nodes (three in eu-west, three in us-west-2). Security groups are configured such that each node can communicate to the external IP of the others. The listen address is defined as the local VPC IP, and the broadcast address is set to each node's public IP.
Everything seems OK:
Datacenter: us-west-2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns (effective) Host ID Token Rack
UN (public ip) 121.3 KB 100.0% b15c18bf-1689-4308-bbe2-d36d38f7c8ea -9103428429654321414 2b
UN (public ip) 46.57 KB 100.0% 89378b79-4228-4b44-a3e3-c6d2f3bbd368 -9174198879812166340 2b
UN (public ip) 46.58 KB 100.0% 4cbd586f-963c-4339-abaa-af313e023abe -9223053993127788404 2b
Datacenter: eu-west
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns (effective) Host ID Token Rack
UN (public ip) 46.59 KB 100.0% 2aad2d39-0099-4ae3-ae46-a1558b1b657c -9163190464402129696 1c
UN (public ip) 98.55 KB 100.0% 94748d93-cf56-4cde-8b44-f75d17b41924 -9211541808465956929 1c
UN (public ip) 84.5 KB 100.0% 3cdeba13-3026-4a1b-a8d1-63eef25049cb -9196529642979836746 1c
So, I create the keyspaces I need.
But, when I try to connect my thrift app to the cluster, I then see the following error from Astyanax:
Caused by: com.netflix.astyanax.connectionpool.exceptions.SchemaDisagreementException:
SchemaDisagreementException: [host=(internal ip):9160, latency=10002(10007),
attempts=1] Can't change schema due to pending schema agreement
I assume this is because the new keyspace didn't replicate properly to the other nodes, but I can't work out why. If I run nodetool describecluster
, it gives me this (bearing in mind that I'm using Ec2MultiRegionSnitch, but for some reason this shows as DynamicEndpointSnitch):
Cluster Information:
Name: mycluster_multiregion
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
UNREACHABLE: [(public IP of this node)]
f9de7b22-1486-37c6-8487-801 [(list of other node public IPs)]
It's the same on every node - it considers itself unreachable. This is technically correct; in EC2 VPC, it's not possible for a node to communicate with itself using its public IP, due to NAT. But, I'm not sure whether or not this is causing my schema disagreement problem, and if it is, I'm not certain there's a simple solution.
Any insight appreciated!