2

I am using containers to run both app servers & Cassandra nodes.

When starting the app server container, I need to specify which Cassandra node(1..n) to connect to. How would you divide the workload?

  1. One app container to one or more Cassandra nodes(How many).
  2. One or more app container to one Cassandra node(How many).
  3. Many to many(How many).

This is for a production setup, 100 % uptime. Each data load from cassandra is small but many.

I should be scalable so I can put in more app containers - like in Kubernetes they have pods. Pods is a set of nodes that make up granules of the application.
Therefore I am looking for the best possible group of containers(Cassandra and App server) that will scale

Info: Kubernetes is a to expensive setup in the beginning. And while waiting for Docker Swarm to be in release state I will do this manually. Any insight is welcome?

Regards

Tibebes. M
  • 6,940
  • 5
  • 15
  • 36
Chris G.
  • 23,930
  • 48
  • 177
  • 302
  • 1
    We need a lot more information. Is this setup for a local test or a production deploy? If its a production deploy then whats the load and the uptime requirement. In general many to many is preferred for fault tolerance but I need to know you usecase to say anything – Usman Ismail Sep 24 '15 at 20:20

2 Answers2

2

Please see:

https://github.com/kubernetes/kubernetes/blob/release-1.0/examples/cassandra/README.md

for a tutorial of how to run Cassandra on Kubernetes.

You will also need to add in best practices like snapshotting the databases to persistent storage and other such things.

(and why do you say that Kubernetes is expensive? Google Container Engine only charges the cost of the VMs for small clusters, and you can deploy open source Kubernetes yourself for free)

brendan
  • 4,116
  • 3
  • 15
  • 7
  • Thanks. Hov would you devide the workload in a pod. Like 3 app containers to 3 cassandra nodes plus one seed node in pod? – Chris G. Sep 28 '15 at 12:00
1

Don't run the app container and Cassandra node inside of the same pod. You want to be able to scale your Cassandra cluster independently of your application.

For the Cassandra side of things, I suggest:

  • A replication controller so you can easily scale your number of Cassandra nodes. Luckily for us, C* nodes are all the same.
  • A Cassandra service so that your application pods have a stable endpoint at which they can talk to C*
  • A headless Kubernetes service to provide your Cassandra pods with seed node IP addresses

You will need to have DNS working in your Kubernetes cluster.

The Cassandra Replication Controller

cassandra-replication-controller.yml

apiVersion: v1
kind: ReplicationController
metadata:
  labels:
    name: cassandra
  name: cassandra
spec:
  replicas: 1
  selector:
    name: cassandra
  template:
    metadata:
      labels:
        name: cassandra
    spec:
      containers:
        - image: vyshane/cassandra
          name: cassandra
          env:
            # Feel free to change the following:
            - name: CASSANDRA_CLUSTER_NAME
              value: Cassandra
            - name: CASSANDRA_DC
              value: DC1
            - name: CASSANDRA_RACK
              value: Kubernetes Cluster
            - name: CASSANDRA_ENDPOINT_SNITCH
              value: GossipingPropertyFileSnitch

            # The peer discovery domain needs to point to the Cassandra peer service
            - name: PEER_DISCOVERY_DOMAIN
              value: cassandra-peers.default.cluster.local.
          ports:
            - containerPort: 9042
              name: cql
          volumeMounts:
            - mountPath: /var/lib/cassandra/data
              name: data
      volumes:
        - name: data
          emptyDir: {}

The Cassandra Service

The Cassandra service is pretty simple. Add the thrift port if you need that.

cassandra-service.yml

apiVersion: v1
kind: Service
metadata:
  labels:
    name: cassandra
  name: cassandra
spec:
  ports:
    - port: 9042
      name: cql
  selector:
    name: cassandra

The Cassandra Peer Discovery Service

This is a headless Kubernetes service that provides the IP addresses of Cassandra peers via DNS A records. The peer service definition looks like this:

cassandra-peer-service.yml

apiVersion: v1
kind: Service
metadata:
  labels:
    name: cassandra-peers
  name: cassandra-peers
spec:
  clusterIP: None
  ports:
    - port: 7000
      name: intra-node-communication
    - port: 7001
      name: tls-intra-node-communication
  selector:
    name: cassandra

The Cassandra Docker Image

We extend the official Cassandra image thus:

Dockerfile

FROM cassandra:2.2
MAINTAINER Vy-Shane Xie <shane@node.mu>
ENV REFRESHED_AT 2015-09-16

RUN apt-get -qq update && \
    DEBIAN_FRONTEND=noninteractive apt-get -yq install dnsutils && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

COPY custom-entrypoint.sh /
ENTRYPOINT ["/custom-entrypoint.sh"]
CMD ["cassandra", "-f"]

Notice the custom-entrypoint.sh script. It simply configures the seed nodes by querying our Cassandra peer discovery service:

custom-entrypoint.sh

#!/bin/bash
#
# Configure Cassandra seed nodes.

my_ip=$(hostname --ip-address)

CASSANDRA_SEEDS=$(dig $PEER_DISCOVERY_DOMAIN +short | \
    grep -v $my_ip | \
    sort | \
    head -2 | xargs | \
    sed -e 's/ /,/g')

export CASSANDRA_SEEDS

/docker-entrypoint.sh "$@"

Starting Cassandra

To start Cassandra, simply run

kubectl create -f cassandra-peer-service.yml
kubectl create -f cassandra-service.yml
kubectl create -f cassandra-replication-controller.yml

This will give you a one-node Cassandra cluster. To add another node:

kubectl scale rc cassandra --replicas=2

Talking to Cassandra

Your application pods can connect to Cassandra using the cassandra hostname. It points to the Cassandra service.

Show me the code

I made a GitHub repo with the above setup: Multinode Cassandra Cluster on Kubernetes.

Vy-Shane
  • 66
  • 1
  • 5