0

I have configured a Riak cluster and all is well. I configured ejabberd.yml to talk to it but the last few lines I see in the logs are as below and I can't figure out how to debug it any further.

Anyone able to help?

Change ejabberd install user to root...
Missing ssl cert for your host. Generating snakeoil ssl cert for xmpp.alchemy.local...
Generating a 4096 bit RSA private key
..............................................++
..............................................++
writing new private key to '/tmp/selfsigned.key'
-----
Writing ssl cert and private key to '/opt/ejabberd/ssl/host.pem'...
Missing ssl cert for your xmpp domain. Generating snakeoil ssl cert for alchemy.local...
Generating a 4096 bit RSA private key
................................................++
........................++
writing new private key to '/tmp/selfsigned.key'
-----
Writing ssl cert and private key to '/opt/ejabberd/ssl/alchemy.local.pem'...
Generating ejabberd config file...
Generating ejabberdctl config file...
Starting ejabberd...
tail: cannot open ‘/var/log/ejabberd/crash.log’ for reading: No such file or directory
tail: cannot open ‘/var/log/ejabberd/error.log’ for reading: No such file or directory
tail: cannot open ‘/var/log/ejabberd/erlang.log’ for reading: No such file or directory
tail: ‘/var/log/ejabberd/crash.log’ has been replaced with a remote file. giving up on this name
tail: ‘/var/log/ejabberd/error.log’ has been replaced with a remote file. giving up on this name
09:57:16.424 [info] Application lager started on node ejabberd@xmpp
09:57:16.431 [info] Application crypto started on node ejabberd@xmpp
09:57:16.442 [info] Application sasl started on node ejabberd@xmpp
09:57:16.452 [info] Application asn1 started on node ejabberd@xmpp
09:57:16.452 [info] Application public_key started on node ejabberd@xmpp
09:57:16.466 [info] Application ssl started on node ejabberd@xmpp
09:57:16.474 [info] Application p1_yaml started on node ejabberd@xmpp
09:57:16.485 [info] Application p1_tls started on node ejabberd@xmpp
09:57:16.493 [info] Application p1_xml started on node ejabberd@xmpp
09:57:16.507 [info] Application p1_stringprep started on node ejabberd@xmpp
09:57:16.511 [info] Application p1_zlib started on node ejabberd@xmpp
09:57:16.519 [info] Application cache_tab started on node ejabberd@xmpp
09:57:16.677 [info] Application mnesia started on node ejabberd@xmpp
09:57:17.179 [info] Application inets started on node ejabberd@xmpp
09:57:17.558 [notice] Changed loglevel of //var/log/ejabberd/ejabberd.log to debug
09:57:17.567 [debug] Supervisor ejabberd_sup started ejabberd_hooks:start_link() at pid <0.239.0>
09:57:17.575 [debug] Supervisor kernel_safe_sup started pg2:start_link() at pid <0.241.0>
09:57:17.575 [debug] Supervisor ejabberd_sup started ejabberd_node_groups:start_link() at pid <0.240.0>
09:57:17.575 [debug] Supervisor ejabberd_sup started ejabberd_system_monitor:start_link() at pid <0.242.0>
09:57:17.577 [debug] Supervisor ejabberd_sup started ejabberd_router:start_link() at pid <0.243.0>
09:57:17.579 [debug] Supervisor ejabberd_sup started ejabberd_router_multicast:start_link() at pid <0.249.0>
09:57:17.582 [debug] Supervisor ejabberd_sup started ejabberd_s2s:start_link() at pid <0.255.0>
09:57:17.584 [debug] Supervisor ejabberd_sup started ejabberd_local:start_link() at pid <0.265.0>
09:57:17.585 [debug] The option captcha_cmd is not configured, but some module wants to use the CAPTCHA feature.
09:57:17.587 [debug] Supervisor ejabberd_sup started ejabberd_captcha:start_link() at pid <0.271.0>
09:57:17.587 [debug] Supervisor ejabberd_sup started ejabberd_tmp_sup:start_link(ejabberd_s2s_in_sup, ejabberd_s2s_in) at pid <0.273.0>
09:57:17.587 [debug] Supervisor ejabberd_sup started ejabberd_tmp_sup:start_link(ejabberd_s2s_out_sup, ejabberd_s2s_out) at pid <0.274.0>
09:57:17.587 [debug] Supervisor ejabberd_sup started ejabberd_tmp_sup:start_link(ejabberd_service_sup, ejabberd_service) at pid <0.275.0>
09:57:17.587 [debug] Supervisor ejabberd_sup started ejabberd_tmp_sup:start_link(ejabberd_iq_sup, gen_iq_handler) at pid <0.276.0>
09:57:17.588 [debug] Supervisor ejabberd_sup started ejabberd_tmp_sup:start_link(ejabberd_frontend_socket_sup, ejabberd_frontend_socket) at pid <0.277.0>
09:57:17.588 [debug] Supervisor ejabberd_sup started ejabberd_listener:start_link() at pid <0.278.0>
09:57:17.591 [info] Application protobuffs started on node ejabberd@xmpp
09:57:17.591 [info] Application riak_pb started on node ejabberd@xmpp
09:57:17.591 [info] Application riakc started on node ejabberd@xmpp
09:57:17.600 [debug] Supervisor inet_gethost_native_sup started undefined at pid <0.289.0>
09:57:17.600 [debug] Supervisor kernel_safe_sup started inet_gethost_native:start_link() at pid <0.288.0>

I managed to get a bit more detail by ejabberdctl debug. It seems to be having an issue with the node name, but the exact same config works fine if i don't add Riak to the ejabberd.yml file.

tail: cannot open ‘/var/log/ejabberd/crash.log’ for reading: No such file or directory
tail: cannot open ‘/var/log/ejabberd/error.log’ for reading: No such file or directory
tail: cannot open ‘/var/log/ejabberd/erlang.log’ for reading: No such file or directory
{error_logger,{{2016,2,4},{19,30,38}},"Invalid node name: ~p~n",['debug-not a tty-ejabberd@xmpp']}
{error_logger,{{2016,2,4},{19,30,38}},crash_report,[[{initial_call,{net_kernel,init,['Argument__1']}},{pid,<0.20.0>},{registered_name,[]},{error_info,{exit,{error,badarg},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,322}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}},{ancestors,[net_sup,kernel_sup,<0.10.0>]},{messages,[]},{links,[<0.17.0>]},{dictionary,[{longnames,false}]},{trap_exit,true},{status,running},{heap_size,987},{stack_size,27},{reductions,902}],[]]}
{error_logger,{{2016,2,4},{19,30,38}},supervisor_report,[{supervisor,{local,net_sup}},{errorContext,start_error},{reason,{'EXIT',nodistribution}},{offender,[{pid,undefined},{name,net_kernel},{mfargs,{net_kernel,start_link,[['debug-not a tty-ejabberd',shortnames]]}},{restart_type,permanent},{shutdown,2000},{child_type,worker}]}]}
{error_logger,{{2016,2,4},{19,30,38}},supervisor_report,[{supervisor,{local,kernel_sup}},{errorContext,start_error},{reason,{shutdown,{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}},{offender,[{pid,undefined},{name,net_sup},{mfargs,{erl_distribution,start_link,[]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]}]}
{error_logger,{{2016,2,4},{19,30,38}},crash_report,[[{initial_call,{application_master,init,['Argument__1','Argument__2','Argument__3','Argument__4']}},{pid,<0.9.0>},{registered_name,[]},{error_info,{exit,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}}},{kernel,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,133}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}},{ancestors,[<0.8.0>]},{messages,[{'EXIT',<0.10.0>,normal}]},{links,[<0.8.0>,<0.7.0>]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,376},{stack_size,27},{reductions,117}],[]]}
{error_logger,{{2016,2,4},{19,30,38}},std_info,[{application,kernel},{exited,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}}},{kernel,start,[normal,[]]}}},{type,permanent}]}
{"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}}},{kernel,start,[normal,[]]}}}"}

Crash dump was written to: //var/log/ejabberd/erl_crash_20160204-193037.dump
Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}}},{k

**** Update ****

The reason that this was happening was because I hadn't specified the --net network connection in my Docker run command and so it couldn't find the network. Unfortunately there is no debug to indicate this it just hangs forever (this would happen in the general case where the network that hosts the Riak cluster cannot be found and is a nightmare to debug).

I still don't know why the "debug" flag causes tty to be added to the node name but i suspect it is somehow related as when i fixed it that also stopped appearing.

Anyone interested in knowing the detail can view it here https://github.com/rroemhild/docker-ejabberd/issues/81#issuecomment-181133336

stevenlivz
  • 21
  • 1
  • 5
  • provide more detailed log of ejabberd and riak. – Sunil Singh Feb 04 '16 at 13:32
  • Updated with the entire log but the last few lines are the only diff between a successful installation against Mnesia and the Riak ones :S – stevenlivz Feb 04 '16 at 13:44
  • you installed ejabberd from source code or binary installer ? you should reinstall ejabberd as root user. – Sunil Singh Feb 04 '16 at 13:51
  • It is installed onto a Docker container via https://github.com/rroemhild/docker-ejabberd which uses the source & also installs as root. – stevenlivz Feb 04 '16 at 13:59
  • Are there any additional debug options I can use to try and figure out why it just stops? My suspicion with with the riak client lib but no idea how to try and figure out what it is doing, Can't see any output at all in the Riak cluster. – stevenlivz Feb 04 '16 at 14:00
  • check these links and reinstall ejabberd http://docs.ejabberd.im/admin/guide/installation/#installing-ejabberd-from-source-code http://docs.ejabberd.im/admin/guide/installation/#installing-ejabberd-with-binary-installer – Sunil Singh Feb 04 '16 at 14:03
  • Thanks - i've been through them both and reinstalled a number of times. Thanks for your help. – stevenlivz Feb 04 '16 at 14:09
  • net_sup is trying to use 'debug-not a tty-ejabberd@xmpp', which is invalid, so it fails. As net_sup is a top level app, the VM shuts down when it exits. 'not a tty' look a lot like the shell command `tty`. Find where `tty` is called and give it a more sane default – Joe Feb 04 '16 at 23:58
  • Thanks @Joe - i've been trying for a few hours to figure out just where on earth this is getting set. Thanks for the heads up though - i will dig deeper into that and try and isolate it. The problem is is only happens when i add some Riak config in which doesn't touch the node name. – stevenlivz Feb 05 '16 at 07:00

1 Answers1

0

You are starting the Erlang VM with an invalid node name:

{error_logger,{{2016,2,4},{19,30,38}},"Invalid node name: ~p~n",['debug-not a tty-ejabberd@xmpp']}

Check where you have configured this: 'debug-not a tty-ejabberd@xmpp', as it is indeed not a valid Erlang VM cluster node name.

Mickaël Rémond
  • 9,035
  • 1
  • 24
  • 44
  • Thanks @Mickaël - i am debugging that just now. I have stripped away most of the script and focusing right in on this. – stevenlivz Feb 05 '16 at 23:03
  • Do you know at what point this is may be assigned. I seem to have all the config set up correctly but is it only when i run `ejabberdctl debug` that I get this (i am trying to debug why it doesn't connect to the Riak cluster but this is stopping me from finding that out). If i don't use debug but instead using live, it launches fine (but alas doesn't connect to Riak). – stevenlivz Feb 06 '16 at 11:57