Cassandra schema agreement with Ec2MultiRegionSnitch

Question

I'm stumped by a problem I'm having with my multi-datacentre cassandra cluster. It's a brand new cluster of six nodes (three in eu-west, three in us-west-2). Security groups are configured such that each node can communicate to the external IP of the others. The listen address is defined as the local VPC IP, and the broadcast address is set to each node's public IP.

Everything seems OK:

Datacenter: us-west-2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Owns (effective)  Host ID                               Token                                    Rack
UN  (public ip)  121.3 KB   100.0%            b15c18bf-1689-4308-bbe2-d36d38f7c8ea  -9103428429654321414                     2b
UN  (public ip)  46.57 KB   100.0%            89378b79-4228-4b44-a3e3-c6d2f3bbd368  -9174198879812166340                     2b
UN  (public ip)  46.58 KB   100.0%            4cbd586f-963c-4339-abaa-af313e023abe  -9223053993127788404                     2b

Datacenter: eu-west
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Owns (effective)  Host ID                               Token                                    Rack
UN  (public ip)  46.59 KB   100.0%            2aad2d39-0099-4ae3-ae46-a1558b1b657c  -9163190464402129696                     1c
UN  (public ip)  98.55 KB   100.0%            94748d93-cf56-4cde-8b44-f75d17b41924  -9211541808465956929                     1c
UN  (public ip)  84.5 KB    100.0%            3cdeba13-3026-4a1b-a8d1-63eef25049cb  -9196529642979836746                     1c

So, I create the keyspaces I need.

But, when I try to connect my thrift app to the cluster, I then see the following error from Astyanax:

Caused by: com.netflix.astyanax.connectionpool.exceptions.SchemaDisagreementException: 
    SchemaDisagreementException: [host=(internal ip):9160, latency=10002(10007), 
    attempts=1] Can't change schema due to pending schema agreement

I assume this is because the new keyspace didn't replicate properly to the other nodes, but I can't work out why. If I run nodetool describecluster, it gives me this (bearing in mind that I'm using Ec2MultiRegionSnitch, but for some reason this shows as DynamicEndpointSnitch):

Cluster Information:
Name: mycluster_multiregion
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
    UNREACHABLE: [(public IP of this node)]

    f9de7b22-1486-37c6-8487-801 [(list of other node public IPs)]

It's the same on every node - it considers itself unreachable. This is technically correct; in EC2 VPC, it's not possible for a node to communicate with itself using its public IP, due to NAT. But, I'm not sure whether or not this is causing my schema disagreement problem, and if it is, I'm not certain there's a simple solution.

Any insight appreciated!

score 1 · Answer 1 · answered Jul 16 '14 at 07:33

As described here http://nsinfra.blogspot.in/2013/06/cassandra-schema-disagreement-problem.html

can you try and sync clocks using NTP?

From AWS docs - Configuring Network Time Protocol (NTP) Network Time Protocol (NTP) is configured by default on Amazon Linux instances; however, an instance needs access to the Internet for the standard NTP configuration to work. The procedures in this section show how to verify that the default NTP configuration is working correctly. If your instance does not have access to the Internet, you need to configure NTP to query a different server in your private network to keep accurate time.

May be for EC2 VPC you need to configure NTP to use the AWS time servers (x.amazon.pool.ntp.org)

Cassandra schema agreement with Ec2MultiRegionSnitch

1 Answers1