0

We are running an ES cluster with 9 nodes on 4 servers (3 servers with 2 nodes and 1 with 3). Recently some containers had to be removed and restarted. Since then we keep seeing the following error:

{"type": "server", "timestamp": "2020-08-20T09:31:13,392+0000", "level": "WARN",
"component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "cluster1",
"node.name": "es01",  "message": "master not discovered or elected yet, an election requires
 at least 7 nodes with ids from [6nIuCqHBTEiM471dsBjAYg, -GlEEYmdQ3CI54S3rtTqsg, jYayprGkSju4oj5qC7UyOA,
QsbFUrU4Q9Szi3ruHB2G6Q, 5tEfV9dLTauX7jIh26MV3w, hOp9eg0YSgiiHpuVu03egA,
ptHs2yWUSWGXaE9ol3UmEw, 9pcHljiMTfKMThywKDdY1g, HhjznjK7SNSoHDRmTKi-_A,
snezZek6Q_KxsKcPcEyfxQ, eFc37ORbTryayD2LxlB4Ew, WfCPBBPfSu2rDBHV8SlKsQ,
Pnv3MDHRTy-xdoGp58UELA], have discovered [{es01}{1bSSls3HT9OOxVLeKOJYpw}{ebwoJDOnT4-7gYJ1KDnstA}
{10.91.225.41}{10.91.225.41:9300}{dim}{rack_id=rack_one,
ml.machine_memory=101303377920, xpack.installed=true, ml.max_open_jobs=20},
{es05}{0YpL_BibRlCHVNMef4OXrw}{FduRNB5XS9Og6l2xKNEdYA}{10.91.225.43}{10.91.225.43:9300}{dim}
{rack_id=rack_one, ml.machine_memory=101303377920, ml.max_open_jobs=20, xpack.installed=true}, 
{es07}{AIadMpTfQ2uo9s5_f22KUw}{757FV3cdSjKvpW3IAb2K1Q}{10.91.225.44}{10.91.225.44:9300}{dim}
{rack_id=rack_two, ml.machine_memory=135100149760, ml.max_open_jobs=20, xpack.installed=true}, 
{es06}{WfCPBBPfSu2rDBHV8SlKsQ}{-ARlIlpKR0yNWlrIO-yMPg}{10.91.225.43}{10.91.225.43:9301}{dim}
{rack_id=rack_one, ml.machine_memory=101303377920, ml.max_open_jobs=20, xpack.installed=true}, 
{es10}{6nIuCqHBTEiM471dsBjAYg}{Bsrz17rvSwSMcU63FI9amg}{10.91.225.44}{10.91.225.44:9303}{dim}
{rack_id=rack_two, ml.machine_memory=135100149760, ml.max_open_jobs=20, xpack.installed=true}, 
{es09}{-GlEEYmdQ3CI54S3rtTqsg}{cRxPVrUsTQerMTFkagKGeA}{10.91.225.44}{10.91.225.44:9302}{dim}
{rack_id=rack_two, ml.machine_memory=135100149760, ml.max_open_jobs=20, xpack.installed=true}, 
{es08}{QsbFUrU4Q9Szi3ruHB2G6Q}{GOUZwCMFQhG1tQaGDEcVgg}{10.91.225.44}{10.91.225.44:9301}{dim}
{rack_id=rack_two, ml.machine_memory=135100149760, ml.max_open_jobs=20, xpack.installed=true}, 
{es02}{jYayprGkSju4oj5qC7UyOA}{whgoGztXR2ScxBS-nak2dA}{10.91.225.41}{10.91.225.41:9301}{dim}
{rack_id=rack_one, ml.machine_memory=101303377920, ml.max_open_jobs=20, xpack.installed=true}] 
which is not a quorum; discovery will continue using [10.91.225.42:9300, 10.91.225.43:9300,
 10.91.225.44:9300] from hosts providers and 
[{es01}{1bSSls3HT9OOxVLeKOJYpw}{ebwoJDOnT4-7gYJ1KDnstA}{10.91.225.41}{10.91.225.41:9300}{dim}
{rack_id=rack_one, ml.machine_memory=101303377920, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 77, 
last-accepted version 160963 in term 77"  }

If I understand this correctly, it expects certain node ids to be available for election. Later on it actually discovers all nodes, but their ids have changed ( I am guessing during the es container restarts?) and therefore they do not match and master nodes are not chosen.

Is my thinking correct ? Any help would be very appreciated.

EDIT: We are also seeing this error:

{"type": "server", "timestamp": "2020-08-20T10:35:26,859+0000", "level": "ERROR", "component": "o.e.x.s.a.e.NativeUsersStore", "cluster.name": "cluster1", "node.name": "es03",  "message": "security index is unavailable. short circuiting retrieval of user [api]"  }
rok
  • 557
  • 4
  • 20
  • How did you specify the seed hosts and the initial master nodes? – Val Aug 20 '20 at 10:20
  • Both are specified in docker-compose as follows: -discovery.seed_hosts=10.91.225.41:9300,10.91.225.42:9300,10.91.225.43:9300,10.91.225.44:9300 -cluster.initial_master_nodes=10.91.225.41:9300,10.91.225.42:9300,10.91.225.43:9300,10.91.225.44:9300 – rok Aug 20 '20 at 10:29
  • I'm asking because I'm reading that `at least 7 nodes` are expected... so I'm wondering where that comes from – Val Aug 20 '20 at 10:31
  • This log message also varies between nodes. Some nodes report 5 are required, some 6, 7... All of them actually discover all other nodes though, it's just the ids that don't match. – rok Aug 20 '20 at 10:49
  • Does this help? https://stackoverflow.com/a/61083120/4604579 – Val Aug 20 '20 at 10:51

0 Answers0