-3

I have the following

a) Document base (100's of web pages) where we have captured all the help topics that users need.

b) Forums Web Page, which is active.

Question: What is the best way to do a search and pull the relevant pages when a user types his/her query?

I am specifically trying to see if I can use any of the API.ai, Wit.a, Microsoft LUIS or IBM Watson framework. Does anyone have any experience with the above framworks/apis for indexing and querying HELP documents.

My understanding is that the above api's will just take the query and figure out the intent, entities and slots/utterances. After that, it is up to the application developer to query the help docs with intent, entities and slots. But in case of HELP documents, the intent is always the same, "to know" or "how to" and the entities are too many, unlike the few entities in Airline/Restaurant Booking domains. How can one leverage such huge document base using the Natural Language Understanding (NLU) from API.ai/Wit.ai/LUIS/Watson.

Constraint: Having the content in the same web location (URL) where it is hosted would be ideal. It is not possible to extract and upload all the help pages and the forums to some cloud.

aadidasu
  • 1,037
  • 4
  • 15
  • 26
  • For Microsoft side, you are forgetting the QnAMaker solution, on top of LUIS NLU, whose purpose is to provide Question and Answer solution based on websites / files / etc – Nicolas R Sep 13 '17 at 08:55
  • @NicolasR, I looked into QnAMaker. It asks for a URL but doesn't do any crawling as far as I understand. Most of the Questions and Answers it suggests is not relevant. I had to go through each question manually and correct the questions and edit the answers. There is a lot of manual effort needed. – aadidasu Sep 13 '17 at 14:40
  • It is crawling! https://qnamaker.ai/Documentation/CreateKb – Nicolas R Sep 13 '17 at 14:48
  • If you give a public URL which has children stories, it just fails. (http://www.kidsworldfun.com/shortstories_hareandtortoise.php). And if you give a frims help pages link(https://www.cisco.com/c/en/us/support/routers/10008-router/model.html). It just gives back 4-6 questions and many times it fails to index with a message "Oops!! Something went wrong. Please try again. If this error persists, please provide details via the Feedback menu." – aadidasu Sep 13 '17 at 14:59

1 Answers1

1

In this case, talking about IBM Watson, you can use the API's: Conversation Service and Discovery Service.

Discovery:

In the Discovery, you can unstructured data for training and query what you want. With Discovery, it only takes a few steps to prepare your unstructured data, create a query that will pinpoint the information you need, and then integrate those insights into your new application or existing solution.

Obs.: You can add Microsoft Word, PDF, HTML, and JSON documents to your collection.

Architecture of Discovery:

enter image description here

Obs.: Based on your example, try to focus on the query Methods.

Conversation:

You can build a solution that understands natural-language input and uses machine learning to respond to customers in a way that simulates a conversation between humans. One good Virtual Assistant/Chatbot.

Architecture of Conversation:

enter image description here

Important: As you can see, IBM Developers built one project using these services and you can see the Video Example and follow the same Logic for creates your application.

Note: The Watson Developer Cloud on Github has a lot of examples for using IBM Watson API's, and have the SDK's for you built your application using Watson Services.

Links:

  • The API Reference for use Discovery Service.
  • The API Reference for use Watson Conversation Service.
  • Watson Developer Cloud - SDK's for Java, C#, Node, Python and more projects examples.
Sayuri Mizuguchi
  • 5,250
  • 3
  • 26
  • 53
  • Can we give public URLs like - http://cisco.com/c/en/us/support/routers/10008-router/model.ht‌ml for Watson to index? Can the users then query it? – aadidasu Sep 13 '17 at 15:09
  • No, you need to have the `.html` file, like [this](https://console.bluemix.net/docs/services/discovery/getting-started-tool.html#step-4-upload-your-documents) example. – Sayuri Mizuguchi Sep 13 '17 at 15:52
  • You can use the "Inspect Element" for getting all data inside the `page.html` file in any public url, and save in one new.html, after, just add in your Discovery service. – Sayuri Mizuguchi Sep 13 '17 at 15:57
  • How would I include a link to the forum? Many times the answers are in forums. – aadidasu Sep 14 '17 at 17:28