2

edited at 2015-11-25 02:10

My ejabberd version is 14.12 and erlang R17B, so this code seems not useful because erlang:system_info(otp_release) in R17B retruns "17"

ejabberd_listener.erl

                SockOpts2 =
                try erlang:system_info(otp_release) >= "R13B" of
                    true -> [{send_timeout_close, true} | SockOpts];
                    false -> SockOpts
                catch
                     _:_ -> []
                end,

I added {send_timeout_close, true} manually in listen option, my problem sees to be solved because socket is closed at the same time of send timeout, trying to send follow-up messages in the queue would receive a {error,enotconn} response. when a {gen_event, 'closed'} msg comes, c2s process terminate normally.

edited at 2015-11-24 03:40



Maybe I found method to reproduce this problem:
1. build a normal c2s connection with xmpp client
2. cut the client's network with some tools, eg. clumsy(drops all the tcp packet from server)
3. keep sending large packets to the c2s process

At first, gen_tcp:send returns ok before sendbuffer fills
Then, gen_tcp:send retruns {error,timeout} because of sendbuffer is filled
the process calls ejabberd_socket:close(Socket) to close the connection

   send_text(StateData, Text) when StateData#state.mgmt_state == active ->
catch ?INFO_MSG("Send XML on stream = ~ts", [Text]),
case catch (StateData#state.sockmod):send(StateData#state.socket, Text) of
  {'EXIT', _} ->
  (StateData#state.sockmod):close(StateData#state.socket),
  error;
  _ ->
  ok

end;

But ejabberd_socket:close/1 seems to be an async call, so the c2s process would handle next message in message_queue, keep calling gen_tcp:send/2, waiting for a send_timeout.
But at this time, ejabberd_receiver called gen_tcp:close(Socket), the socket is closed, so previous gen_tcp:send/2 never returns. I have tried several times with this method, it happens 100%.


Briefly, if I send packets to a client socket which is unable to receive packet and the sendbuffer is fullfilled, i would receive a {error, timeout} after sendtimeout. But, if another async process closed the socket when i am waiting for a sendtimeout with gen_tcp:send/2, I would never get a response.

so, I did this with erl, and gen_tcp:send/2 no response ( cuting network at step3, keep sending packet, async close). I want to know is this a problem or because reason of myself? enter image description here




original post below


Generally in ejabberd , i route message to client process, send to tcp socket via this function. And it works well most time. Module ejabberd_c2s.erl

   send_text(StateData, Text) when StateData#state.mgmt_state == active ->
catch ?INFO_MSG("Send XML on stream = ~ts", [Text]),
case catch (StateData#state.sockmod):send(StateData#state.socket, Text) of
  {'EXIT', _} ->
    (StateData#state.sockmod):close(StateData#state.socket),
    error;
  _ ->
    ok
end;

But in some cases the c2s pid blocked on gen_tcp:send like this

erlang:process_info(pid(0,8353,11)).
[{current_function,{prim_inet,send,3}},
{initial_call,{proc_lib,init_p,5}},
{status,waiting},
{message_queue_len,96},
{messages ...}
...

Most cases happened when user's network status not so good, the receiver process should send 2 messages to c2s pid , and c2s would terminate session or wait for resume

{'$gen_event',closed}
{'DOWN',#Ref<0.0.1201.250595>,process,<0.19617.245>,normal}

I printed message queue in the c2s process, the 2 msg are in the queue, waiting to be handled. Unfortunately, the queue does not move any more becasue the process had blocked before handling these messages, as described above, stacked at prim_inet:send/3 when tring to do gen_tcp:send/2. The queue grows very large after days, and ejabberd crahes when the process asking for more memory.

prim_inet:send/3 source :
send(S, Data, OptList) when is_port(S), is_list(OptList) ->
?DBG_FORMAT("prim_inet:send(~p, ~p)~n", [S,Data]),
try erlang:port_command(S, Data, OptList) of
false -> % Port busy and nosuspend option passed
    ?DBG_FORMAT("prim_inet:send() -> {error,busy}~n", []),
    {error,busy};
true ->
    receive
    {inet_reply,S,Status} ->
        ?DBG_FORMAT("prim_inet:send() -> ~p~n", [Status]),
        Status
    end
catch
error:_Error ->
    ?DBG_FORMAT("prim_inet:send() -> {error,einval}~n", []),
    {error,einval}
end.


It seems the port driver did not reply {inet_reply,S,Status} after erlang:port_command(S, Data, OptList) . the gen_tcp:send function would block infinity, Can anyone explain this?

Gang Zhao
  • 126
  • 8
  • Isn't this more or less what you'd expect though? If network conditions are poor the driver has to queue transmissions one way or another, and the queue can only be so big... – Michael Oct 26 '15 at 09:54
  • Maybe you can relieve the problem by setting the gen_tcp high_watermark? – Michael Oct 26 '15 at 10:03
  • And/or high_msgq_watermark. – Michael Oct 26 '15 at 10:25
  • @Michael It's not about fast or slow, the problem is the function never returns after i call gen_tcp:send/2 – Gang Zhao Oct 26 '15 at 14:00
  • A blocking socket will always block until the data you write is copied to a kernel buffer. This is usually quick, but if the kernel buffer is full it can't return until it has emptied enough for the data you are writing to be copied. – Michael Oct 26 '15 at 14:41

1 Answers1

0

It depends on the version of Erlang you are using. The option to timeout on gen_tcp send is not used on old ejabberd version because it was not available at that time in Erlang. Moreover, you have to use a very recent version of Erlang as some bug were fixed in Erlang itself regarding that options.

Mickaël Rémond
  • 9,035
  • 1
  • 24
  • 44