High Available Graylog(mongodb,elasticsearch) logging system with two datacenters

Question

I need to configure a high available graylog2 cluster which is divided to 2 datacenters. if the first datacenters completely down, the second must be continue to operation and visa versa. (a load balancer at the frontside offcourse)

For example each datancenter can have 1 elasticsearch, 1 graylog and 2 mongodb instances. In the end I have 2 elasticsearch, 2 graylog and 4 mongodb instances.

As I read from mongodb documentation I need an odd number of voters for it. So assume that just the voters are 3 of them. (first datacenter 2 and the second have 1)

With some configuration elastic search works as expected. But mongodb not :(

So is it possible to do a high available config with 2 datacenters with under the circumtance of any datacenter is completely down?

Finally I want to share my configs. Note: my current config have just 2 mongodb's

Thanks..

elastic search 1st:

  cluster.name: graylog
  node.name: graylog-1
  network.host: 0.0.0.0
  http.port: 9200
  discovery.zen.ping.multicast.enabled: false
  discovery.zen.ping.unicast.hosts: ["10.0.0.2"]
  discovery.zen.minimum_master_nodes: 1
  index.number_of_replicas: 2

elastic search 2nd:

  cluster.name: graylog
  node.name: graylog-2
  network.host: 0.0.0.0
  http.port: 9200
  discovery.zen.ping.multicast.enabled: false
  discovery.zen.ping.unicast.hosts: ["10.0.0.1"]
  discovery.zen.minimum_master_nodes: 1

mongodb 1st and 2nd (rs.conf()):

  {
        "_id" : "rs0",
        "version" : 4,
        "protocolVersion" : NumberLong(1),
        "members" : [
                {
                        "_id" : 0,
                        "host" : "10.0.0.1:27017",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {

                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                },
                {
                        "_id" : 1,
                        "host" : "10.0.0.2:27017",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {

                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                }
        ],
        "settings" : {
                "chainingAllowed" : true,
                "heartbeatIntervalMillis" : 2000,
                "heartbeatTimeoutSecs" : 10,
                "electionTimeoutMillis" : 10000,
                "getLastErrorModes" : {

                },
                "getLastErrorDefaults" : {
                        "w" : 1,
                        "wtimeout" : 0
                },
                "replicaSetId" : ObjectId("****")
        }
  }

graylog 1st:

  is_master = true
  node_id_file = /etc/graylog/server/node-id
  password_secret = ***
  root_password_sha2 = ***
  plugin_dir = /usr/share/graylog-server/plugin
  rest_listen_uri = http://10.0.0.1:9000/api/
  web_listen_uri = http://10.0.0.1:9000/
  rotation_strategy = count
  elasticsearch_max_docs_per_index = 20000000
  rotation_strategy = count
  elasticsearch_max_docs_per_index = 20000000
  elasticsearch_max_number_of_indices = 20
  retention_strategy = delete
  elasticsearch_max_number_of_indices = 20
  retention_strategy = delete
  elasticsearch_shards = 2
  elasticsearch_replicas = 1
  elasticsearch_index_prefix = graylog
  allow_leading_wildcard_searches = false
  allow_highlighting = false
  elasticsearch_discovery_zen_ping_unicast_hosts = 10.0.0.1:9300, 10.0.0.2:9300
  elasticsearch_network_host = 0.0.0.0
  elasticsearch_analyzer = standard
  output_batch_size = 500
  output_flush_interval = 1
  output_fault_count_threshold = 5
  output_fault_penalty_seconds = 30
  processbuffer_processors = 5
  outputbuffer_processors = 3
  processor_wait_strategy = blocking
  ring_size = 65536
  inputbuffer_ring_size = 65536
  inputbuffer_processors = 2
  inputbuffer_wait_strategy = blocking
  message_journal_enabled = true
  message_journal_dir = /var/lib/graylog-server/journal
  lb_recognition_period_seconds = 3
  mongodb_uri = mongodb://10.0.0.1,10.0.0.2/graylog
  mongodb_max_connections = 1000
  mongodb_threads_allowed_to_block_multiplier = 5
  content_packs_dir = /usr/share/graylog-server/contentpacks
  content_packs_auto_load = grok-patterns.json
  proxied_requests_thread_pool_size = 32

graylog 2nd:

  is_master = false
  node_id_file = /etc/graylog/server/node-id
  password_secret = ***
  root_password_sha2 = ***
  plugin_dir = /usr/share/graylog-server/plugin
  rest_listen_uri = http://10.0.0.2:9000/api/
  web_listen_uri = http://10.0.0.2:9000/
  rotation_strategy = count
  elasticsearch_max_docs_per_index = 20000000
  rotation_strategy = count
  elasticsearch_max_docs_per_index = 20000000
  elasticsearch_max_number_of_indices = 20
  retention_strategy = delete
  elasticsearch_max_number_of_indices = 20
  retention_strategy = delete
  elasticsearch_shards = 2
  elasticsearch_replicas = 1
  elasticsearch_index_prefix = graylog
  allow_leading_wildcard_searches = false
  allow_highlighting = false
  elasticsearch_discovery_zen_ping_unicast_hosts = 10.0.0.1:9300, 10.0.0.2:9300
  elasticsearch_transport_tcp_port = 9350
  elasticsearch_network_host = 0.0.0.0
  elasticsearch_analyzer = standard
  output_batch_size = 500
  output_flush_interval = 1
  output_fault_count_threshold = 5
  output_fault_penalty_seconds = 30
  processbuffer_processors = 5
  outputbuffer_processors = 3
  processor_wait_strategy = blocking
  ring_size = 65536
  inputbuffer_ring_size = 65536
  inputbuffer_processors = 2
  inputbuffer_wait_strategy = blocking
  message_journal_enabled = true
  message_journal_dir = /var/lib/graylog-server/journal
  lb_recognition_period_seconds = 3
  mongodb_uri = mongodb://10.0.0.1,10.0.0.2/graylog
  mongodb_max_connections = 1000
  mongodb_threads_allowed_to_block_multiplier = 5
  content_packs_dir = /usr/share/graylog-server/contentpacks
  content_packs_auto_load = grok-patterns.json
  proxied_requests_thread_pool_size = 32

score 0 · Answer 1 · answered Dec 13 '16 at 09:45

0

There are a lot of misconceptions in your configuration files.

For example, in your Elasticsearch configuration you wrote:

discovery.zen.minimum_master_nodes: 2

How would that ever work, if one of the two ES nodes was down?

And in your Graylog configuration you wrote:

elasticsearch_shards = 2
elasticsearch_replicas = 1

How would that ever work, if one of the two ES nodes was down?

Short answer: It's not easy to create a highly available cluster with autonomous parts in two different data centers (over WAN).

I'd recommend resorting to another architecture, e. g. using RabbitMQ or Apache Kafka to buffer log messages and let Graylog (running in 1 data center) pull messages from there.

answered Dec 13 '16 at 09:45

joschi

12,746
4
44
50

Thanks for the answer. In the same time I have updated my elastic search config above. Now elastic search works as expected. But with a split-brain possibility right? but still mongo db issue. I understand your point but it does not give us fully working system when the datacenter1 is down right? – Fethi Dec 13 '16 at 10:27
I did not change the shard and replica settings of elasticsearch but it seems primary shards at the first machine and secondary keeps just replicas.(and as i know it changes at on-off situations) what is the ideal values in that case do you suggest? Thanks – Fethi Dec 13 '16 at 10:32
Define "fully working system". With the given setup, you won't have a "working system" anyway, if one data center was down. – joschi Dec 13 '16 at 13:25

High Available Graylog(mongodb,elasticsearch) logging system with two datacenters

1 Answers1