how to prepare data for domain specific chat-bot

Question

I am trying to make a chatbot. all the chatbots are made of structure data. I looked Rasa, IBM watson and other famous bots. Is there any ways that we can convert the un-structured data into some sort of structure, which can be used for bot training? Let's consider bellow paragraph-

Packaging unit A packaging unit is used to combine a certain quantity of identical items to form a group. The quantity specified here is then used when printing the item labels so that you do not have to label items individually when the items are not managed by serial number or by batch. You can also specify the dimensions of the packaging unit here and enable and disable them separately for each item.

It is possible to store several EAN numbers per packaging unit since these numbers may differ for each packaging unit even when the packaging units are identical. These settings can be found on the Miscellaneous tab: There are also two more settings in the system settings that are relevant to mobile data entry:

When creating a new item, the item label should be printed automatically. For this reason, we have added the option ‘Print item label when creating new storage locations’ to the settings. When using mobile data entry devices, every item should be assigned to a storage location, where an item label is subsequently printed that should be applied to the shelf in the warehouse to help identify the item faster.

how to make the bot from such a data any lead would be highly appreciated. Thanks! is this idea in picture will work?just_a_thought

score 2 · Answer 1 · answered Nov 06 '18 at 13:21

The data you are showing seems to be a good candidate for a passage search. Basically, you would like to answer user question by the most relevant paragraph found in your training data. This uses-case is handled by Watson Discovery service that can analyze unstructured data as you are providing and then you can query the service with input text and the service answers with the closest passage found in the data.

From my experience you also get a good results by implementing your own custom TF/IDF algorithm tailored for your use-case (TF/IDF is a nice similarity search tackling e.g. the stopwords for you).

Now if your goal would be to bootstrap a rule based chatbot using these kind of data then these data are not that ideal. For rule-based chatbot the best data would be some actual conversations between users asking questions about the target domain and the answers by some subject matter expert. Using these data you might be able to at least do some analysis helping you to pinpoint the relevant topics and domains the chatbot should handle however - I think - you will have hard time using these data to bootstrap a set of intents (questions the users will ask) for the rule based chatbot.

TLDR If I would like to use Watson service, I would start with Watson Discovery. Alternatively, I would implement my own search algorithm starting with TF/IDF (which maps rather nicely to your proposed solution).

Really thanks for your answer. But on what basis TF/IDF selects the `start` and `end` of the answer from text files? — Niraj D Pandey, Nov 12 '18 at 14:50
You need to split the documents into paragraphs that can be then matched by `TF/IDF` based on the similarity to the question. The algorithm then finds the most relevant paraghraph. — Michal Bida, Nov 22 '18 at 14:52
yes! @Niray needs a full text search engine and I feel you are right: the old fashioned TF/IDF will be maybe a good solution without any AI "blackbox" on the cloud :) — Giorgio Robino, May 01 '20 at 16:19

how to prepare data for domain specific chat-bot

1 Answers1