0

I've got a problem, I use gen server to do some simple work like this:

  one handle_cast to do a long time work(takes 60 seconds)
  one handle_cast to do a very fast work

everything is fine when traffic is low. But when the server process is working on the first long time work and client send thousands of messages to the server(1000000 messages in mailbox for example), the long time work will become extremely slow, it may takes 600 seconds to finish.

The problem just like this problem on stackoverflow In Erlang, when a process's mailbox growth bigger, it runs slower, why?.

But I still don't understand. If it because of garbage collect, how could garbage collect takes so long or so frequently ?

Community
  • 1
  • 1
蛋汤饭
  • 31
  • 3
  • Could you show the code for the `handle_cast` clause that takes a long time? One thing that could cause this is if there is a `receive` expression that matches on specific messages. – legoscia Dec 09 '16 at 13:25
  • @legoscia Oh, no. This process just insert about 30,0000 documents into mongodb in a loop. My machine can write more than 5,000 documents per second, and it's also not cause by the mongodb disk IO. – 蛋汤饭 Dec 09 '16 at 14:43
  • @legoscia And this problem is just like what meet http://stackoverflow.com/questions/36216246/in-erlang-when-a-processs-mailbox-growth-bigger-it-runs-slower-why – 蛋汤饭 Dec 09 '16 at 14:50
  • 2
    Questions like this are best accompanied by an [SSCCE](http://sscce.org/). Otherwise, others can mostly only guess about what's happening. – Ryan Stewart Dec 10 '16 at 05:58

3 Answers3

1

If You have a one process which has so big mailbox, You system has probably an error in the design.

Because all traffic is going through one single process and it makes a bottleneck.

One of the main idea in the Erlang is, that create new process is fast and cheap and all transactions should have its own.

One process for all transactions is necessary only in such places where these transaction should have a serialized access to some shared resource (typically updates to some ETS table). This serialize access (using messages) should be short as mu as is possible.

  • Yah, the system design might be an error. But still, the problem is why mailbox makes the process slow. – 蛋汤饭 Dec 09 '16 at 14:51
  • It could be caused by the selective recieve. – user8755563 Dec 09 '16 at 16:14
  • No, gen server don't have a selective receive, and there also don't have any receive statement in my code – 蛋汤饭 Dec 09 '16 at 16:16
  • selective recieve - see for example http://ndpar.blogspot.cz/2010/11/erlang-explained-selective-receive.html – user8755563 Dec 09 '16 at 16:18
  • reciece is inside gen_rerver - if you have long mailque every receive check ach item in the que. So big que => performace degradation. Change you code to have more then one worker which is processing inputs. – user8755563 Dec 09 '16 at 16:20
0

It is not because you are using a handle cast that the code inside the cast is not blocking, only the interface to the client is released so it does not wait for the request completion.

So if a request needs 60 sec to be completed, you have to spawn a separate process from your server to handle it. It is the only way to be able to continue to handle new incoming messages.

A new potential issue rises: is it possible to really handle new requests in parallel with document insertion in MongoDB?

If yes everything is fine,

If no, then you will have to modify your design, for example return a negative acknowledge to any client whose request is not compatible with the database insertion (or ignore the request if possible). You have to empty the mail box as fast as possible, accumulating messages during 60 sec is not a good option, you are pushing erlang far beyond its normal usage, just imagine your use case:

  • A long reaquest is received by the server, it enters in the database update loop;
  • during this time, the incomming messages are accumulated in the mailbox, that means that they are copied in the memory area of the server;
  • very soon, the process will be missing memory and the VM will have to suspend the process (so your database loop) to increase the memory allocated to the server, and eventually make some copy of data.
  • this memory management slows down the server, the database updtade takes longer, and more messages are accumulated, and so on.
Pascal
  • 13,977
  • 2
  • 24
  • 32
  • Thank you for your answer. I will try to redesign my server. But I still fell this memory management is to long to suspend my process from 60s to 600s. – 蛋汤饭 Dec 10 '16 at 14:53
0

Finally, I find my answer in this paper

In section 3. The memory architecture of erlang is process-centric. Each process allocates and manages its own memory area which typically includes a PCB, private stack, and private heap.

This cause the disadvantage: High memory fragmentation

A process cannot utilize the memory (e.g., the heap) of another process even if there are large amounts of unused space in that memory area. This typically implies that processes can allocate only a small amount of memory by default. This in turn usually results in a larger number of calls to the garbage collector.

So large size of mailbox can cause the whole system slow.

But, this still not the main point. Follow the great comment of @pascal, the long time cost of dealing with database, I ignored that gen_sever is all about send and receive. I ran into gen_sever:call in gen, there is obviously a selective receive! Forgive me for just taste a little about erlang~

蛋汤饭
  • 31
  • 3
  • In addition, I suspect that the 60sec request to update MongoDB contains synchronous calls that forces the server process to parse its mailbox a lot of times: a synchronous access to a process is done by sending a message, and waiting for the response. The receive statement is executed in the calling process (your server) and each time a message arrives it has to parse the mailbox. There is a mechanism to optimize this parse, but I think that if answers from data base and other incoming requests are interleaved, the optimization is less efficient. – Pascal Dec 12 '16 at 05:16
  • That's really really a grep post point ! My mongo operations are all sync calls(db_pool implemented by poolboy), and I ran into erlang-opt gen_server https://github.com/erlang/otp/blob/OTP-18.3/lib/stdlib/src/gen.erl#L146 . Just because of the selective receive, my process slows down! Thanks again. – 蛋汤饭 Dec 12 '16 at 07:19