2

I'm trying to implement NER(Named Entity Extraction) using stanford NLP. final goal is to convert free text to query format. I created a custom dictionary and am able to extract entities and build query

people who are from newyork

I'll build query

     select * from people where region = 'newyork'

but the issue comes when the statement is negated

people who are not from newyork

How to extract negative scenario from this statement, Is there any way possible even outside of stanford NLP

Any help is appreciated

tourist
  • 4,165
  • 6
  • 25
  • 47

2 Answers2

1

I know 2 possibilities to implement negation relation:

  • Define custom property "not a ..." and apply it everywhere.
  • Use knowledge database, extract LOCATIONs from data, define "not from smth" as "LOCATION is not smth".

I used second approach successfully, but I was able to restrict my domain to finite set of subjects and relations. I found Stanford's typed dependencies incredibly useful, they might help you too (to find those from smth relations).

dveim
  • 3,381
  • 2
  • 21
  • 31
  • I tried the first approach .The problem with the above approach is if there are multiple negations in a sentence then we can't see which entity got negated – tourist Sep 14 '16 at 09:45
  • Disadvantage of first approach is that you have to have N negations per N entities. Basically, the number of extra properties grows too fast. Try second approach, it has additional benefit -- you can map your relations on some DB and use SQL's to query information. – dveim Sep 14 '16 at 10:21
1

What do you want to do is called 'natural language interface to database' and Standford NLP NER (based on CRF sequence models) might be inappropriate solution for this task. CRF-based NER is good when named entity meaning depends on sentence semantic context: person names, company names, countries etc. Annotated text with marked names is used for training the recognizer, and this approach is not actually usable with named entities from the database.

Rule-based recognizers are much better in this case:

  • you don't need to train them: it is enought to keep dictionaries used by rule engine up-to-date (list of data table names, their columns etc)
  • you may easily add custom parsers you need: for dates, for numbers / conditions, logical operators (like "not", "or", "and")

You may glance at my library that was specially written for natural language queries recognition: NLQuery.

Vitaliy Fedorchenko
  • 8,447
  • 3
  • 37
  • 34