I have a collection of sentences from which I would like to extract those that express the following semantic meaning:
I like Italian cuisine.
There are many variations of how such a sentence can be structured and worded. Some examples:
- I enjoy Italian, Chinese, and Indian food.
- Cuisines I love are Chinese, Italian, and Indian.
- Some cuisines I like include Indian, Italian, and Chinese.
- I like all kinds of cuisines around the world, such as Italian, Chinese, and Indian.
What is a good way to approach this problem?
I am no expert in NLP. Here is just something I could think of:
- Find synonyms for 'like' and 'cuisine'
- Build dependency trees for sentences using a parser (Stanford or Parsey McParseface)
- Trim the dependency tree to only include the subject (e.g. 'I'), the verb keyword (e.g. 'like'), the noun keyword (e.g. 'food'), and the noun modifier (e.g. 'Italian'). This can be done by finding a path covering all these nodes in the tree.
- Store a collection of dependency trees of training sentences.
- Check if the dependency tree of a testing sentence exists in training
Any ideas, suggestions, and/or comments would be much appreciated!