0

I am building a chat bot with rasa-nlu. I went through the tutorial and I have built a simple bot. However, I need lots of training data for building a chat bot that is able to book a taxi. So I need data to build a specific bot.

Is there a repository, or corpus, for booking a taxi? Or is there a way to generate this kind of dataset?

Andrea Madotto
  • 183
  • 2
  • 15
jsphdnl
  • 415
  • 6
  • 21

4 Answers4

4

This is a blog post from one of the founders of Rasa and I think it's got some really excellent advice. I think you're going about it the wrong way asking for a pre-built training set. Start it yourself, then add friends, etc until you've built a training set that works best for your bot.

Put on your robot costume

Beyond that the Rasa docs have this under improving model performance

When the rasa_nlu server is running, it keeps track of all the predictions it’s made and saves these to a log file. By default log files are placed in logs/. The files in this directory contain one json object per line. You can fix any incorrect predictions and add them to your training set to improve your parser.

I think you'll be surprised how far you can get with just the training set you can come up with yourself.

Good luck on finding the corpus, but either way hope these links and snippets helped.

Caleb Keller
  • 2,151
  • 17
  • 26
2

One method of doing this is, head over to LUIS.AI

Login using Office 365, Make your own Taxi Booking App, by giving in Intents and Utterances like below:

enter image description here

enter image description here

Now after training and publishing the model, download the corpus like below: enter image description here

Now, after downloading the corpus, it will look something like this: enter image description here

Install RASA NLU, I have Windows 8.1 on my machine, so the steps are as follows:

These are the steps to configure RASA:

First install: Anaconda 4.3.0 64-bit Windows for installing Python 3.6 interpreter: https://repo.continuum.io/archive/Anaconda3-4.3.0-Windows-x86_64.exe

&

Python Tools for Visual Studio 2015: https://ptvs.azureedge.net/download/PTVS%202.2.6%20VS%202015.msi

Next, install the following packages in this order in administrative mode in command prompt:

  1. Spacy Machine Learning Package: pip install -U spacy
  2. Spacy English Language Model: python -m spacy download en
  3. Scikit Package: pip install -U scikit-learn
  4. Numpy package for mathematical calculations: pip install -U numpy
  5. Scipy Package: pip install -U scipy
  6. Sklearn Package for Intent Recognition: pip install -U sklearn-crfsuite
  7. NER Duckling for better Entity Recognition with Spacy: pip install -U duckling
  8. RASA NLU: pip install -U rasa_nlu==0.10.4

After installing all the above packages successfully, make a spaCy configuration file which will be read by RASA, like as follows:

{
    "project": "Travel",
    "pipeline": "spacy_sklearn",
    "language": "en",
    "num_threads": 1,
    "max_training_processes": 1,
    "path": "C:\\Users\\Kunal\\Desktop\\RASA\\models",
    "response_log": "C:\\Users\\Kunal\\Desktop\\RASA\\log",
    "config": "C:\\Users\\Kunal\\Desktop\\RASA\\config_spacy.json",
    "log_level": "INFO",
    "port": 5000,
    "data": "C:\\Users\\Kunal\\Desktop\\RASA\\data\\FlightBotFinal.json",
    "emulate": "luis",
    "spacy_model_name": "en",
    "token": null,
    "cors_origins": ["*"],
    "aws_endpoint_url": null
  }

Next, Make a directory structure like this:

data folder -> Will contain all LUIS formatted corpus

models -> Will contain all trained models

logs -> Will contain active learning logs and RASA framework logs

Like this,

enter image description here

Now, make batch file scripts for Training and Starting RASA NLU Server.

Make a TrainRASA.bat by Notepad or Visual Studio Code and write this:

 python -m rasa_nlu.train -c config_spacy.json
 pause

Now make a StartRASA.bat by Notepad or Visual Studio Code and write this:

python -m rasa_nlu.server -c config_spacy.json
pause

Now train and start RASA Server by clicking on the batch file scripts that you just now made.

Now, everything is ready, just fire up chrome and issue a HTTP GET request to your enpoint /parse

Like: http://localhost:5000/parse?q=&project=

You will get a JSON response that corresponds to LUISResult class of Bot Framework C#.

enter image description here

Now handle the business logic you want to perform after doing that.

Alternatively, You can take a look at RASA Core, it was mainly built for this purpose.

RASA Core, which uses machine learning to build dialogs instead of simple if-else statements.

Kunal Mukherjee
  • 5,775
  • 3
  • 25
  • 53
2

The below link contains datasets relevant for commercial chatbot applications ('human-machine' dialogues). It's a fairly comprehensive collection of both human-human and human-machine text dialogue datasets, as well as audio dialogue datasets. https://breakend.github.io/DialogDatasets/

Default picture
  • 710
  • 5
  • 12
2

We did face the same problem while trying to build a love relationship coach bot. Long story short, we decided to create a simple tool to collect data from our friends, our colleagues or people on Mechanical Turk: https://chatbotstrap.io.

The idea is to create polls like this one: https://chatbotstrap.io/en/project/q5pimyskbhna2rm?language=en&nb_scenarios=10 and send them to anyone you know. With that solution, we were able to build a dataset of more than 6000 sentences divided in 10 intents in a few days.

The tool is free as long as you agree that the dataset constructed with it can be opensourced. They are also payed plans if you prefer to be the sole beneficiary of the data you collect.