0

Environment

  • ejabberd version: 20.04
  • Erlang version: Erlang (SMP,ASYNC_THREADS)(BEAM) emulator version 9.2
  • OS: Linux (Debian)
  • Installed from: source

Errors from crash.log

2022-02-08 22:42:45 =CRASH REPORT==== crasher: initial call: pgsql_proto:init/1 pid: <0.27318.6018> registered_name: [] exception exit: {{init,{error,timeout}},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,349}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]} ancestors: ['ejabberd_sql_vhost1.xmpp_12','ejabberd_sql_sup_vhost1.xmpp',ejabberd_db_sup,ejabberd_sup,<0.87.0>] message_queue_len: 0 messages: [] links: [] dictionary: [] trap_exit: false status: running heap_size: 376 stack_size: 27 reductions: 997 neighbours:

Bug description I am trying to upgrade from eJabberd 20.04 to 20.07. My cluster setup has three nodes. The rolling upgrade on two nodes were successful. When node1 is trying to leave cluster for upgrade, it gives the following error:

Failed RPC connection to the node 'ejabberd@xmpp1.node: timeout

When I try ejabberdctl status, the following was returned: The node 'ejabberd@xmpp1.node' is started with status: started Failed RPC connection to the node 'ejabberd@xmpp1.node': {'EXIT', {timeout, {gen_server,call, [application_controller, which_applications]}}}

On Erlang shell, the node is still shown part of the cluster

nodes(). ['ejabberd@xmpp3.node','ejabberd@xmpp2.node']

Could you please help me in resolving this issue.

  • This same question was silently cross-posted in https://github.com/processone/ejabberd/issues/3764 – Badlop Feb 11 '22 at 12:20

2 Answers2

0

This may be a dumb comment, but just in case it gives you some idea:

You are running the leave_cluster command in one of the nodes, and it doesn't connect correctly to the other one.

You could try to run the command in the other node.

If that doesn't help, maybe there's some internal way to attempt to remove a node from the cluster...

But you should update your question and clarify what are the node names, where you attempt to perform the admin task, and what exactly is the method you are attempting.

Badlop
  • 580
  • 3
  • 5
0

Thanks for your reply and sorry for the late response. The issue happened in the first node after successfully completing upgrade of two nodes. The first node became unresponsive after in the last two nodes. We found the reason for failure of node 1 was too many failed SQL queries completing rolling upgrade in the last two nodes. We found the reason for failure of node 1 was too many failed SQL queries due to connection issues.

The node names are ejabberd@xmpp1.node ejabberd@xmpp2.node ejabberd@xmpp3.node

To resolve the issue we had to kill the unresponsive eJabberd processes and restart eJabberd on first node. We are continuing with further upgrades.