3

I have a collection of sentences from which I would like to extract those that express the following semantic meaning:

I like Italian cuisine.

There are many variations of how such a sentence can be structured and worded. Some examples:

  • I enjoy Italian, Chinese, and Indian food.
  • Cuisines I love are Chinese, Italian, and Indian.
  • Some cuisines I like include Indian, Italian, and Chinese.
  • I like all kinds of cuisines around the world, such as Italian, Chinese, and Indian.

What is a good way to approach this problem?

I am no expert in NLP. Here is just something I could think of:

  • Find synonyms for 'like' and 'cuisine'
  • Build dependency trees for sentences using a parser (Stanford or Parsey McParseface)
  • Trim the dependency tree to only include the subject (e.g. 'I'), the verb keyword (e.g. 'like'), the noun keyword (e.g. 'food'), and the noun modifier (e.g. 'Italian'). This can be done by finding a path covering all these nodes in the tree.
  • Store a collection of dependency trees of training sentences.
  • Check if the dependency tree of a testing sentence exists in training

Any ideas, suggestions, and/or comments would be much appreciated!

  • 1
    This is a relation extraction task, and as all nlp tasks it's not easy. Stanford CoreNLP has a relation extraction module. See if you can use it or train in on your data. – Aris F. Jun 01 '16 at 21:25

1 Answers1

2

I think you are in the right track. My idea needs the synonyms you have identified (for example, "enjoy" = "like" = "love", "food" = "cuisine") in the first place. If you look at your corpus, you can find all the sentences have some identical patterns, i.e.

--- I --- enjoy/like/love --- Italian ---

"-" means all the other tokens in the sentences. You can use a pattern mining algorithm (i.e. PrefixSpan) to discover this pattern first. If you have this step done, then you are pretty much close to the answer. Regarding how to add word "cuisine" at the end of the pattern, you probably need to use Stanford Dependency Parser to get the dependencies and extract the pair consists of words "Italian" and "Cuisine". At last you can combine these two results and get your answer. To test a sentence, simply look at it if it has the pattern.

This approach does have limitation if such pattern does not exist or the syntax of the sentences is too complicated to find a pattern. And also it is not on the semantic level. Thus I am very interested in other people's answer about how to resolve this situation. I will update if I get some ideas about how to solve it at real semantic level.

Hope it helps.

Raymond Chen
  • 419
  • 5
  • 13