I am building a system to change natural language questions into SQL queries. Right now what I am implementing is a refactoring of a natural language question to be more structured so that I will have an easier time converting it into a sql statement.
The restructured language will follow these rules:
what they want to do ex. "Find" "List" "Give" attributes they want us to retrieve ex. Table attributes from sql schema entities that they want us to match on
This refactored language is great and can easily be transformed into SQL, but the problem is that I am creating a large combination of all the noun chunks and entities which means lots of sentences. Future development will help minimize these but that is for later.
So from the large amount of sentences I need to find which one is most similar to the original query.
So my question is, what kind of similarity functions would you recommend? ex. parse tree structure, semantic and syntactic similarity...
Thanks for the help, I am building this for open-source so any help is going to a good cause