Speeding up model training using MITIE with Rasa

Question

I'm training a model for recognizing short, one to three sentence strings of text using the MITIE back-end in Rasa. The model trains and works using spaCy, but it isn't quite as accurate as I'd like. Training on spaCy takes no more than five minutes, but training for MITIE ran for several days non-stop on my computer with 16GB of RAM. So I started training it on an Amazon EC2 r4.8xlarge instance with 255GB RAM and 32 threads, but it doesn't seem to be using all the resources available to it.

In the Rasa config file, I have num_threads: 32 and set max_training_processes: 1, which I thought would help use all the memory and computing power available. But now that it has been running for a few hours, CPU usage is sitting at 3% (100% usage but only on one thread), and memory usage stays around 25GB, one tenth of what it could be.

Do any of you have any experience with trying to accelerate MITIE training? My model has 175 intents and a total of 6000 intent examples. Is there something to tweak in the Rasa config files?

score 5 · Answer 1 · answered Sep 17 '17 at 02:26

So I am going to try to address this from several angles. First specifically from the Rasa NLU angle the docs specifically say:

Training MITIE can be quite slow on datasets with more than a few intents.

and provide two alternatives:

Use the mite_sklearn pipeline which trains using sklearn.
Use the MITIE fork where Tom B from Rasa has modified the code to run faster in most cases.

Given that you're only getting a single cores used I doubt this will have an impact, but it has been suggested by Alan from Rasa that num_threads should be set to 2-3x your number of cores.

If you haven't evaluated both of those possibilities then you probably should.

Not all aspects of MITIE are multi-threaded. See this issue opened by someone else using Rasa on the MITIE GitHub page and quoted here:

Some parts of MITIE aren't threaded. How much you benefit from the threading varies from task to task and dataset to dataset. Sometimes only 100% CPU utilization happens and that's normal.

Specifically on training data related I would recommend that you look at the evaluate tool recently introduced into the Rasa repo. It includes a confusion matrix that would potentially help identify trouble areas.

This may allow you to switch to spaCy and use a portion of your 6000 examples as an evaluation set and adding back in examples to the intents that aren't performing well.

I have more questions on where the 6000 examples came from, if they're balanced, and how different each intent is, have you verified that words from the training examples are in the corpus you are using, etc but I think the above is enough to get started.

It will be no surprise to the Rasa team that MITIE is taking forever to train, it will be more of a surprise that you can't get good accuracy out of another pipeline.

As a last resort I would encourage you to open an issue on the Rasa NLU GitHub page and and engage the team there for further support. Or join the Gitter conversation.

Thanks for the reply and I appreciate you highlighting the new evaluate tool. I actually am currently using the `mitie_sklearn` pipline and the MITIE fork. I've found that that `ner_mitie` performs better than `ner_crf` on my data set. The `ner_mitie` is what I'm having trouble multi-treading and is slowing down training (which is part of the `mitie_sklearn` pipeline). It sounds like the conclusion is that `ner_mitie` doesn't support multi-treading. — hackerman, Sep 17 '17 at 02:49
Hold the phone, you specifically were talking about intents and now you've moved onto NER. When you say the accuracy is low is that intent or entity accuracy? When you say you have 6000 examples are those intent examples or entity examples or both? If you have 175 intents how many entities do you have and how many training examples per entity. Can you provide some examples of entities that you have? — Caleb Keller, Sep 17 '17 at 03:28
Sorry for the confusion. There are around 900 entity examples total for 25 distinct entities. An example of an entity would be `broken_things`, for example: "my tv isn't working" where tv part of the entity `broken_things` with the value tv. The `ner_crf` model misses "tv" in testing much more frequently than `ner_mitie`. (edited comment for clarity) — hackerman, Sep 17 '17 at 04:02
I've done my best to answer the original question, which is how can I speed up MITIE. I think we should address your entity model accuracy in a new question. If you create a new one and tag it with rasa-nlu I'll see it. or create an issue on GitHub. I have several ideas. — Caleb Keller, Sep 19 '17 at 02:09

Speeding up model training using MITIE with Rasa

1 Answers1