How to set up ES cluster?

Question

Assuming I have 5 machines I want to run an elasticsearch cluster on, and they are all connected to a shared drive. I put a single copy of elasticsearch onto that shared drive so all three can see it. Do I just start the elasticsearch on that shared drive on eall of my machines and the clustering would automatically work its magic? Or would I have to configure specific settings to get the elasticsearch to realize that its running on 5 machines? If so, what are the relevant settings? Should I worry about configuring for replicas or is it handled automatically?

You're not going to use the shared folder for the index, are you? — javanna, May 30 '13 at 19:01

score 54 · Accepted Answer · edited Dec 12 '16 at 10:59

54

its super easy.

You'll need each machine to have it's own copy of ElasticSearch (simply copy the one you have now) -- the reason is that each machine / node whatever is going to keep it's own files that are sharded accross the cluster.

The only thing you really need to do is edit the config file to include the name of the cluster.

If all machines have the same cluster name elasticsearch will do the rest automatically (as long as the machines are all on the same network)

Read here to get you started: https://www.elastic.co/guide/en/elasticsearch/guide/current/deploy.html

When you create indexes (where the data goes) you define at that time how many replicas you want (they'll be distributed around the cluster)

edited Dec 12 '16 at 10:59

mihir6692

177
1
4
19

answered May 29 '13 at 18:23

Transact Charlie

2,195
17
14

9

Also -- install head plugin. It makes monitoring the state of your indexes a whole lot easier. http://mobz.github.io/elasticsearch-head/ – Transact Charlie May 29 '13 at 18:25
1

Why is it that you need to have separate copies on each machine? Based on what I have seen for single node machines, you can change the node name to have multiple instances running with the same copy: http://www.concept47.com/austin_web_developer_blog/elasticsearch/how-to-run-multiple-elasticsearch-nodes-on-one-machine/ Is this not applicable when you have separate machines with a single share drive? I would think that if I set a cluster name for the single copy, I could have each of the machines run that single copy, so the cluster name would theoretically be the same right or am i incorrect? – Rolando May 29 '13 at 18:38
Each machine (or node) is going to need it's own filespace to write lucene index files. if you change the configuration file (check the link) to point to another directory on the local node then it may work. – Transact Charlie May 29 '13 at 18:48
I was under the impression that the different node names of each cluster since a single instance of elasticsearch would automatically be able to tell that another instance is already running, elasticsearch would create separate directories based on node. (correct me if this is not the correct assumption) – Rolando May 29 '13 at 18:48
1

why not just try it -- you could always clean it up? Report back - I'd be interested. In the past I've had an install running on each machine because that seemed more redundant and safe. – Transact Charlie May 29 '13 at 18:49
I tried it (run one copy of es shared), and it appears funky, but I do not know if its me screwing up configuration, or if your solution is the only/correct way to go. – Rolando May 29 '13 at 18:50

KannarKK · Answer 2 · 2015-04-07T02:32:58.500

53

It is usually handled automatically.

If autodiscovery doesn't work. Edit the elastic search config file, by enabling unicast discovery

Node 1:

    cluster.name: mycluster
    node.name: "node1"
    node.master: true
    node.data: true
    discovery.zen.ping.multicast.enabled: false
    discovery.zen.ping.unicast.hosts: ["node1.example.com"]

Node 2:

    cluster.name: mycluster
    node.name: "node2"
    node.master: false
    node.data: true
    discovery.zen.ping.multicast.enabled: false
    discovery.zen.ping.unicast.hosts: ["node1.example.com"]

and so on for node 3,4,5. Make node 1 master, and the rest only as data nodes.

Edit: Please note that by ES rule, if you have N nodes, then by convention, N/2+1 nodes should be masters for fail-over mechanisms They may or may not be data nodes, though.

Also, in case auto-discovery doesn't work, most probable reason is because the network doesn't allow it (and therefore disabled). If too many auto-discovery pings take place across multiple servers, the resources to manage those pings will prevent other services from running correctly.

For ex, think of a 10,000 node cluster and all 10,000 nodes doing the auto-pings.

edited Apr 07 '15 at 02:32

answered Mar 04 '14 at 07:11

KannarKK

1,593
20
35

For clarification, should all "unicast.hosts" be the IP/FQDN of the master? Seems to be what your example is indicating. – harperville Dec 22 '14 at 18:45
According to the elasticsearch.yml comments in 1.7.x, if you set "node.master: false" then the node will NEVER become a master.... – Jonesome Reinstate Monica Sep 24 '15 at 20:38
@Jonesome - my example illustrates one master and >1 data nodes. if you don't want a node to ever act as a master, you should be okay if you set the property to be false. However, if you ever want your node to become master, this property should never be touched. – KannarKK Sep 27 '15 at 16:50
@KannarKK But with ES, if you set "node.master: false" on every node except 1, if the master goes down, won't the whole cluster go down? Doesn't that defeat a major purpose of the cluster? Why not leave "node.master" out of the yml completely (which defaults it to true) so that if the master dies, another node can become the master? – Jonesome Reinstate Monica Sep 27 '15 at 18:04
@Jonesome - I have already included this info in the answer: ....Please note that by ES rule, if you have N nodes, then by convention, N/2+1 nodes should be masters for fail-over mechanisms They may or may not be data nodes, though. Therefore, if you have >1 masters, add all their info in the list of hosts – KannarKK Sep 29 '15 at 09:38
Does autodiscovering work for someone in last version of Elasticsearch? – ipeacocks Jan 02 '16 at 20:03

Carrot · Answer 3 · 2019-09-06T09:15:45.923

Elastic Search 7 changed the configurations for cluster initialisation. What is important to note is the ES instances communicate internally using the Transport layer(TCP) and not the HTTP protocol which is normally used to perform ops on the indices. Below is sample config for 2 machines cluster.

cluster.name: cluster-new
node.name: node-1
node.master: true
node.data: true
bootstrap.memory_lock: true
network.host: 0.0.0.0
http.port: 9200
transport.host: 102.123.322.211
transport.tcp.port: 9300
discovery.seed_hosts: [“102.123.322.211:9300”,"102.123.322.212:9300”]
cluster.initial_master_nodes: 
        - "node-1"
        - "node-2”

Machine 2 config:-

cluster.name: cluster-new
node.name: node-2
node.master: true
node.data: true
bootstrap.memory_lock: true
network.host: 0.0.0.0
http.port: 9200
transport.host: 102.123.322.212
transport.tcp.port: 9300
discovery.seed_hosts: [“102.123.322.211:9300”,"102.123.322.212:9300”]
cluster.initial_master_nodes: 
        - "node-1"
        - "node-2”

cluster.name: This has be same across all the machines that are going to be part of a cluster.

node.name : Identifier for the ES instance. Defaults to machine name if not given.

node.master: specifies whether this ES instance is going to be master or not

node.data: specifies whether this ES instance is going to be data node or not(hold data)

bootsrap.memory_lock: disable swapping.You can start the cluster without setting this flag. But its recommended to set the lock.More info: https://www.elastic.co/guide/en/elasticsearch/reference/master/setup-configuration-memory.html

network.host: 0.0.0.0 if you want to expose the ES instance over network. 0.0.0.0 is different from 127.0.0.1( aka localhost or loopback address). It means all IPv4 addresses on the machine. If machine has multiple ip addresses with a server listening on 0.0.0.0, the client can reach the machine from any of the IPv4 addresses.

http.port: port on which this ES instance will listen to for HTTP requests

transport.host: The IPv4 address of the host(this will be used to communicate with other ES instances running on different machines). More info: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-transport.html

transport.tcp.port: 9300 (the port where the machine will accept the tcp connections)

discovery.seed_hosts: This was changed in recent versions. Initialise all the IPv4 addresses with TCP port(important) of ES instances that are going to be part of this cluster. This is going to be same across all ES instances that are part of this cluster.

cluster.initial_master_nodes: node names(node.name) of the ES machines that are going to participate in master election.(Quorum based decision making :- https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-quorums.html#modules-discovery-quorums)

score 5 · Answer 4 · answered Jan 24 '16 at 14:40

I tried the steps that @KannarKK suggested on ES 2.0.2, however, I could not bring the cluster up and running. Evidently, I figured out something, as I had set tcp port number on Master, on the Slave configuration discovery.zen.ping.unicast.hosts needs Master's port number along with IP address ( tcp port number ) for discovery. So when I try following configuration it works for me.

Node 1

cluster.name: mycluster
node.name: "node1"
node.master: true
node.data: true
http.port : 9200
tcp.port : 9300
discovery.zen.ping.multicast.enabled: false
# I think unicast.host on master is redundant.
discovery.zen.ping.unicast.hosts: ["node1.example.com"]

Node 2

cluster.name: mycluster
node.name: "node2"
node.master: false
node.data: true
http.port : 9201
tcp.port : 9301
discovery.zen.ping.multicast.enabled: false
# The port number of Node 1
discovery.zen.ping.unicast.hosts: ["node1.example.com:9300"]

How to set up ES cluster?

4 Answers4

Linked