I am loading some graph data sing titan API and configured cassandra as the storage backend. My graph data has around 1 million vertices. I want this data to be distributed across N cassandra nodes.
So for this, I configured 3 nodes in same system with IPs for each node as 127.0.0.1
, 127.0.0.2
and 127.0.0.3
. The output of nodetool status shows all 3 IPs and load shared equally.
I tried loading a graph but the whole data is replicated in all 3 nodes (1M vertices in node1, 1M vertices in node2 and 1M vertices in node3). I want the data to be distributed across all 3 nodes, like 1M/3 in node1, 1M/3 in node2 and 1M/3 in node3.
output of DESCRIBE KEYSPACE TITAN
:
CREATE KEYSPACE titan WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
output of nodetool status:
Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack
UN 127.0.0.1 7.79 MB 1 ? f5a689f0-f4c1-4f68-ab81-58066e986cd4 rack1
UN 127.0.0.2 229.79 KB 1 ? b6940e7e-b6eb-4d1f-959e-b5bd0f5cea15 rack1
UN 127.0.0.3 7.11 MB 1 ? a3244b16-a73c-4801-868f-05de09615ed9 rack1
Can someone please share me the details on correct configurations to share the load. Please correct me in case anything is wrong.
Thanks, Hari