0

the supervisor seems to fail silently starting child...

Here's the supervisor

-behaviour(supervisor).
-export([start_socket/0, init/1, start_link/1]).

-define(SSL_OPTIONS, [{active, once},
                      {backlog, 128},
                      {reuseaddr, true},
                      {packet, 0},
                      {cacertfile, "./ssl_key/server/gd_bundle.crt"},
                      {certfile, "./ssl_key/server/cert.pem"},
                      {keyfile, "./ssl_key/server/cert.key"},
                      {password, "**********"}
                     ]).

start_link(Port) ->
    Role = list_to_atom(atom_to_list(?MODULE) ++ lists:flatten(io_lib:format("~B", [Port]))),
    supervisor:start_link({local, Role}, ?MODULE, [Port]).

init([Port]) ->
    R = ssl:listen(Port, ?SSL_OPTIONS),
    LSocket = case R of
                  {ok, LSock} ->
                      LSock;
                  Res ->
                      io:fwrite("gateway_sup Error: ~p~n", [Res])
              end,
    spawn_link(fun empty_listeners/0),
    ChildSpec = [{socket,
                  {gateway_serv, start_link, [LSocket]},
                  temporary, 1000, worker, [gateway_serv]}
                ],
    {ok, {{simple_one_for_one, 3600, 3600},
          ChildSpec
         }}.

empty_listeners() ->
    io:fwrite("---------------------- empty_listeners~n"),
    [start_socket() || _ <- lists:seq(1,128)],
    ok.

start_socket() ->
    io:fwrite("++++++++++++++++++++++ start_socket~n"),
    supervisor:start_child(?MODULE, []).

And the gen_server

-module(gateway_serv).

-behaviour(gen_server).
-export([start_link/1, init/1, handle_call/3, handle_cast/2, handle_info/2, code_change/3, terminate/2]).

start_link(LSocket) ->
io:fwrite("#################~n"),
    gen_server:start_link(?MODULE, [LSocket], []).

init([LSocket]) ->
io:fwrite("/////////////////~n"),
    gen_server:cast(self(), accept),
    {ok, #client{listenSocket=LSocket, pid=self()}}.

handle_cast(accept, G = #client{listenSocket=LSocket}) ->
    {ok, AcceptSocket} = ssl:transport_accept(LSocket),
    gateway_sup:start_socket(),
    case ssl:ssl_accept(AcceptSocket, 30000) of
    ok ->
        timer:send_after(10000, closingSocket),
        ssl:setopts(AcceptSocket, [{active, once}, {mode, list}, {packet, 0}]),
        {noreply, G#client{listenSocket=none, socket=AcceptSocket}};
    {error, _Reason} ->
        {stop, normal, G}
    end;
handle_cast(_, G) ->
    {noreply, G}.

The gen_server's start_link/1 is apparently never called (checked with a io:fwrite).

Can't seems to find out why...

TheSquad
  • 7,385
  • 8
  • 40
  • 79

2 Answers2

2

When you register the supervisor you use:

Role = list_to_atom(atom_to_list(?MODULE) ++ lists:flatten(io_lib:format("~B", [Port]))),

therefore when you call:

start_socket() ->
    io:fwrite("++++++++++++++++++++++ start_socket~n"),
    supervisor:start_child(?MODULE, []).

you are calling a supervisor that does not exist.

You should call it as:

supervisor:start_child(Role, []).

You can pass Role as a parameter to the function.

user601836
  • 3,215
  • 4
  • 38
  • 48
0

Something seems strange to me, you launch empty_listener calling start_socket() calling supervisor:start_child within the init function of the supervisor, at this time the supervisor did not finished its initialization phase. So there is a race between the processes which call the supervisor to start children and the supervisor itself.

I think that this code should be outside the init function:

  • First start the supervisor using start_link(Port),
  • and when it returns call the function start_socket().

I have done an application which use this pattern and I had 2 level of supervisors:

main supervisor (one_for_all strategy)
|                         |
|                         |
v                         v
application   ------->    supervisor (simple_one_for_one strategy)
server      start_child   worker factory
                          |
                          |
                          v*
                          many children

EDIT: Forget this race condition,

I made a test introducing a delay before the end of the init function, and I have seen that the start_child function, waiting for the end of the init, nothing is lost. OTP guys have been even more cautious than I imagined...

Pascal
  • 13,977
  • 2
  • 24
  • 32
  • seems to me he is using the strategy proposed in LYSE. Am i wrong? I used this technique many times and did not meet any problem – user601836 Mar 31 '13 at 20:23
  • Exactly, found at http://learnyousomeerlang.com/buckets-of-sockets#sockserv-revisited it works perfectly with other code I wrote... But here, I don't know, can't seems to find the problem. – TheSquad Mar 31 '13 at 20:26
  • As answered by user601836, a parameter is missing to access your supervisor, I have tried your code with these modification and it works fine. But I really think that you should not call start_child from the init function of the supervisor. Even if it works once, you are not sure that it will always work. If you want to avoid another module, you can cast a message to the supervisor that will spawn the function after the init phase, in the cast callback. – Pascal Mar 31 '13 at 21:15