suggest list of how-to articles based on text content

Question

I have 20,000 messages (combination of email and live chat) between my customer and my support staff. I also have a knowledge base for my product.

Often times, the questions customers ask are quite simple and my support staff simply point them to the right knowledge base article.

What I would like to do, in order to save my support staff time, is to show my staff a list of articles that may likely be relevant based on the initial user's support request. This way they can just copy and paste the link to the help article instead of loading up the knowledge base and searching for the article manually.

I'm wondering what solutions I should investigate.

My current line of thinking is to run analysis on existing data and use a text classification approach:

For each message, see if there is a response with a link to a how-to article
If Yes, extract key phrases (microsoft cognitive services)
TF-IDF?
Treat each how-to as a 'classification' that belongs to sets of key phrases
Use some supervised machine learning, support vector machines maybe to predict which 'classification, aka how-to article' belongs to key phrase determined from a new support ticket.
Feed new responses back into the set to make the system smarter.

Not sure if I'm over complicating things. Any advice on how this is done would be appreciated.

PS: naive approach of just dumping 'key phrases' into search query of our knowledge base yielded poor results since the content of the help article is often different than how a person phrases their question in an email or live chat.

This is an interesting application for the Machine Learning theory that I have just acquired! — JackCColeman, Feb 06 '17 at 01:56

JackCColeman · Answer 1 · 2017-02-06T02:23:20.793

A simple classifier along the lines of a "spam" classifier might work, except that each FAQ would be a feature as opposed to a single feature classifier of spam, not-spam.

Most spam-classifiers start-off with a dictionary of words/phrases. You already have a start on this with your naive approach. However, unlike your approach a spam classifier does much more than a text search. Essentially, in a spam classifier, each word in the customer's email is given a weight and the sum of weights indicates if the message is spam or not-spam. Now, extend this to as many features as FAQs. That is, features like: FAQ1 or not-FAQ1, FAQ2 or not-FAQ2, etc.

Since your support people can easily identify which of the FAQs an e-mail requires then using a supervised learning algorithm would be appropriate. To reduce the impact of any miss-classification errors, then consider the application presenting a support person with the customer's email followed by the computer generated response and all the support person would have to-do is approve the response or modify it. Modifying a response should result in a new entry in the training set.

Support Vector Machines are one method to implement machine learning. However, you are probably suggesting this solution way too early in the process of first identifying the problem and then getting a simple method to work, as well as possible, before using more sophisticated methods. After all, if a multi-feature spam classifier works why invest more time and money in something else that also works?

Finally, depending on your system this is something I would like to work-on.

suggest list of how-to articles based on text content

1 Answers1