How to run RASA Server in multi thread mode using gunicorn

Question

The Rasa server is running fine on the server using single thread. python -m rasa_nlu.server --path projects --emulate dialogflow --response_log logs

I am trying to run it on the server and would like to enable multi-threading. As per RASA documentation,(https://nlu.rasa.com/0.8.12/http.html) I am trying to run below command.

gunicorn -w 4 --threads 12 -k gevent -b 127.0.0.1:5000 rasa_nlu.wsgi

This gives me below error.

Please suggest.

score 2 · Accepted Answer · answered Aug 02 '18 at 03:21

This is no longer possible, the Rasa documentation you are pointing to is for version 0.8 they are now on version 0.12. There are several factors contributing to why support for this was removed, primarily:

High memory usage for language models
Move from Flask to Klein for asynchronous training

Here's a Github issue with some more information: https://github.com/RasaHQ/rasa_nlu/issues/793

If you were going for higher overall throughput of /parse requests then the recommendation is to use Docker combined with nginx to run multiple instances on the same server - if the server is large enough to handle it - or run multiple smaller instances, still with an nginx reverse proxy.

Note that training has already been moved into separate processes. The number of processes available for training can be set with the --max_training_processes argument. Also some components of the Rasa pipeline support multiple threads. The number of threads available to these pipeline components can be set with the --num_threads argument.

I tried running the server with --num_threads = 12 option but when i check the number of threads running on unix box , it shows only 1 thread. Will it scale up as load increases or should it be showing all 12 threads running in my case. Thanks! — Nishant Adhikari, Aug 06 '18 at 15:53
only certain components are capable os using more that one thread. The main process is not. So if you try training a model using the spacy pipeline you should see it briefly use 12 threads then drop back down. — Caleb Keller, Aug 06 '18 at 15:54
It has been told that many components of RASA dont support multithreading. So, if multiple requests are coming to RASA NLU,then would it be able to handle or not? Is such configuration possible? — Sand T, Dec 10 '18 at 05:29

How to run RASA Server in multi thread mode using gunicorn

1 Answers1