RabbitMQ: start_app always fails, service restart is OK?

Question

Apologies if this is a stupid question, first week on MQ and I've been thrown in at the deep end.

MQVersion is 3.7.6, the OS is Redhat 7.4 (Maipo) and Erlang is 9.3.2

We have 2 RabbitMQ servers in a cluster, both are AWS instances. If they are spun up, they build and join a cluster quite happily. However, if I wish to do a stop_app, reset, start_app - the start command fails claiming the address in already in use

{throw:{could_not_start,rabbitmq_management, {rabbitmq_management, {bad_return, {{rabbit_mgmt_app,start,[normal,[]]}, {'EXIT', {{could_not_start_listener, [{port,35672}], {shutdown, {failed_to_start_child,ranch_acceptors_sup, {listen_error,rabbit_web_dispatch_sup_35672,eaddrinuse}}}},}

The port is definitely not in use, a simple port check during the start sees the port go from closed, to open (briefly) and then closed again. It seems the only way to resolve this is to restart the service and the node comes back up.

I have also tweaked the config file and changed the port, tried the start and get exactly the same error.

I do have another issue but would like to solve this one first.

This is a the contents of the rabbitmq-server.config

% This file managed by Puppet
% Template Path: rabbitmq/templates/rabbitmq.config
[
  {rabbit, [
    {cluster_nodes, {['rabbit@server1', 'rabbit@server2'], disc}},
    {cluster_partition_handling, ignore},
    {tcp_listen_options, [
         {keepalive,     true},
         {backlog,       128},
         {nodelay,       true},
         {linger,        {true, 0}},
         {exit_on_close, false}
    ]},
    {log_levels, [{connection, debug}]},
    {loopback_users, []},
    {default_user, <<"admin">>},
    {default_pass, <<"password">>}
  ]},
  {kernel, [

  ]}
,
  {rabbitmq_management, [
    {listener, [
      {port, 35672}
    ]}
  ]}
].
% EOF

Debugging / Log:

2018-06-14 14:09:35.712 [info] <0.33.0> Application rabbitmq_management_agent started on node rabbit@server2
2018-06-14 14:09:35.712 [debug] <0.1152.0> Supervisor rabbit_web_dispatch_sup started rabbit_web_dispatch_registry:start_link() at pid <0.1153.0>
2018-06-14 14:09:35.712 [debug] <0.1152.0> Supervisor rabbit_web_dispatch_sup started gen_event:start_link({local,webmachine_log_event}) at pid <0.1154.0>
2018-06-14 14:09:35.712 [info] <0.33.0> Application rabbitmq_web_dispatch started on node rabbit@server2
2018-06-14 14:09:35.712 [info] <0.33.0> Application amqp_client started on node rabbit@server2
2018-06-14 14:09:35.741 [debug] <0.1162.0> Supervisor {<0.1162.0>,ranch_listener_sup} started ranch_conns_sup:start_link(rabbit_web_dispatch_sup_35672, worker, 5000, ranch_tcp, 5000, cowboy_clear) at pid <0.1163.0>
2018-06-14 14:09:35.741 [error] <0.1164.0> Failed to start Ranch listener rabbit_web_dispatch_sup_35672 in ranch_tcp:listen([{port,35672}]) for reason eaddrinuse (address already in use)
2018-06-14 14:09:35.741 [error] <0.1164.0> CRASH REPORT Process <0.1164.0> with 0 neighbours exited with reason: {listen_error,rabbit_web_dispatch_sup_35672,eaddrinuse} in ranch_acceptors_sup:listen_error/4 line 59
2018-06-14 14:09:35.741 [error] <0.1162.0> Supervisor {<0.1162.0>,ranch_listener_sup} had child ranch_acceptors_sup started with ranch_acceptors_sup:start_link(rabbit_web_dispatch_sup_35672, 100, ranch_tcp, [{port,35672}]) at undefined exit with reason {listen_error,rabbit_web_dispatch_sup_35672,eaddrinuse} in context start_error
2018-06-14 14:09:35.742 [error] <0.1153.0> ** Generic server rabbit_web_dispatch_registry terminating

Any suggestions welcome...

Thanks

Sorry Martin, it always helps :). MQVersion is 3.7.6, the OS is Redhat 7.4 (Maipo) — Dave Shaw, Jun 14 '18 at 11:53

score 1 · Answer 1 · answered Jun 18 '18 at 06:50

So after much digging I've found the answer.

The guys who did the initial setup but the management port on 35672, it appears that this is reserved (35672-35680). It seems to be that as the service starts it's can bind to this address, however, stopping and starting the app must have a slightly different order.

I've switched the port back down to 15672 where it should be. Hope this helps someone else in the future.

RabbitMQ: start_app always fails, service restart is OK?

1 Answers1