I am trying to build a Database Q/A Chatbot (or specifically a Natural Language Interface to Database if you will!). And I am having trouble extracting the entities/slots from the Natural Language Query.
Take this example, I have a table
Interns | Branch | Birthday | Salary($/H) |
---|---|---|---|
A | Mechanical Engineering | 2000-01-20 | 25 |
B | IT Engineering | 1999-05-09 | 45 |
A | Electrical Engineering | 2000-01-20 | 35 |
C | Mechanical Engineering | 2002-09-13 | 35 |
Example questions that user may ask from this table,
What is the total salary for intern A?
- Desired Entities: {Interns: A}Tell me the aggregate salary for A, B and C.
- Desired Entities: {Interns: [A,B,C]} #Notice how column name isn't mentionedWhich Interns are persuing Mechanical Engineering Branch?
- Desired Entities: {Branch: Mechanical Engineering}
Question:
How to identify these Entities/Slots?
- This answer to a similar question suggests using Rule-based recognizers. But I couldn't find how to build them.
Things I have tried:
- Creating a custom Named Entity Recognition model using Spacy to Identify the Interns and Branch names. This model was successfully able to identify values that were given in the Training Data but was failing to identify new values.
- Rules based on Part Of Speech Tagging: This approach was kinda successful but wasn't generic. This means it may not work if the same sentence is spoken in another way.