We are running an ES cluster with 9 nodes on 4 servers (3 servers with 2 nodes and 1 with 3). Recently some containers had to be removed and restarted. Since then we keep seeing the following error:
{"type": "server", "timestamp": "2020-08-20T09:31:13,392+0000", "level": "WARN",
"component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "cluster1",
"node.name": "es01", "message": "master not discovered or elected yet, an election requires
at least 7 nodes with ids from [6nIuCqHBTEiM471dsBjAYg, -GlEEYmdQ3CI54S3rtTqsg, jYayprGkSju4oj5qC7UyOA,
QsbFUrU4Q9Szi3ruHB2G6Q, 5tEfV9dLTauX7jIh26MV3w, hOp9eg0YSgiiHpuVu03egA,
ptHs2yWUSWGXaE9ol3UmEw, 9pcHljiMTfKMThywKDdY1g, HhjznjK7SNSoHDRmTKi-_A,
snezZek6Q_KxsKcPcEyfxQ, eFc37ORbTryayD2LxlB4Ew, WfCPBBPfSu2rDBHV8SlKsQ,
Pnv3MDHRTy-xdoGp58UELA], have discovered [{es01}{1bSSls3HT9OOxVLeKOJYpw}{ebwoJDOnT4-7gYJ1KDnstA}
{10.91.225.41}{10.91.225.41:9300}{dim}{rack_id=rack_one,
ml.machine_memory=101303377920, xpack.installed=true, ml.max_open_jobs=20},
{es05}{0YpL_BibRlCHVNMef4OXrw}{FduRNB5XS9Og6l2xKNEdYA}{10.91.225.43}{10.91.225.43:9300}{dim}
{rack_id=rack_one, ml.machine_memory=101303377920, ml.max_open_jobs=20, xpack.installed=true},
{es07}{AIadMpTfQ2uo9s5_f22KUw}{757FV3cdSjKvpW3IAb2K1Q}{10.91.225.44}{10.91.225.44:9300}{dim}
{rack_id=rack_two, ml.machine_memory=135100149760, ml.max_open_jobs=20, xpack.installed=true},
{es06}{WfCPBBPfSu2rDBHV8SlKsQ}{-ARlIlpKR0yNWlrIO-yMPg}{10.91.225.43}{10.91.225.43:9301}{dim}
{rack_id=rack_one, ml.machine_memory=101303377920, ml.max_open_jobs=20, xpack.installed=true},
{es10}{6nIuCqHBTEiM471dsBjAYg}{Bsrz17rvSwSMcU63FI9amg}{10.91.225.44}{10.91.225.44:9303}{dim}
{rack_id=rack_two, ml.machine_memory=135100149760, ml.max_open_jobs=20, xpack.installed=true},
{es09}{-GlEEYmdQ3CI54S3rtTqsg}{cRxPVrUsTQerMTFkagKGeA}{10.91.225.44}{10.91.225.44:9302}{dim}
{rack_id=rack_two, ml.machine_memory=135100149760, ml.max_open_jobs=20, xpack.installed=true},
{es08}{QsbFUrU4Q9Szi3ruHB2G6Q}{GOUZwCMFQhG1tQaGDEcVgg}{10.91.225.44}{10.91.225.44:9301}{dim}
{rack_id=rack_two, ml.machine_memory=135100149760, ml.max_open_jobs=20, xpack.installed=true},
{es02}{jYayprGkSju4oj5qC7UyOA}{whgoGztXR2ScxBS-nak2dA}{10.91.225.41}{10.91.225.41:9301}{dim}
{rack_id=rack_one, ml.machine_memory=101303377920, ml.max_open_jobs=20, xpack.installed=true}]
which is not a quorum; discovery will continue using [10.91.225.42:9300, 10.91.225.43:9300,
10.91.225.44:9300] from hosts providers and
[{es01}{1bSSls3HT9OOxVLeKOJYpw}{ebwoJDOnT4-7gYJ1KDnstA}{10.91.225.41}{10.91.225.41:9300}{dim}
{rack_id=rack_one, ml.machine_memory=101303377920, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 77,
last-accepted version 160963 in term 77" }
If I understand this correctly, it expects certain node ids to be available for election. Later on it actually discovers all nodes, but their ids have changed ( I am guessing during the es container restarts?) and therefore they do not match and master nodes are not chosen.
Is my thinking correct ? Any help would be very appreciated.
EDIT: We are also seeing this error:
{"type": "server", "timestamp": "2020-08-20T10:35:26,859+0000", "level": "ERROR", "component": "o.e.x.s.a.e.NativeUsersStore", "cluster.name": "cluster1", "node.name": "es03", "message": "security index is unavailable. short circuiting retrieval of user [api]" }