6

I need to solve the following using NLP, can you give me pointers on how to achieve this using OpenNLP API

a. How to find out if a sentence implies a certain action in the past, present or future.

(e.g.) I was very sad last week - past
       I feel like hitting my neighbor - present
       I am planning to go to New York next week - future

b. How to find the word which corresponds to a person or company or country

(e.g.) John is planning to specialize in Electrical Engineering in UC Berkley and pursue a career with IBM).

Person = John

Company = IBM

Location = Berkley

Thanks

SST
  • 2,054
  • 5
  • 35
  • 65
  • 2
    That's called named entity recognition (NER). There are lots of packages for it. – Fred Foo Aug 01 '13 at 09:51
  • 1
    This [documentation](http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html) includes `NER` and everything else. – Steve P. Aug 01 '13 at 09:59
  • Have a look at my answer [in this question](http://stackoverflow.com/questions/17770734/extract-relevant-sentences-to-entity/17845053#17845053), I also include a small example of how to use NER (but with Python NLTK). – dkar Aug 01 '13 at 12:55
  • In opennlp NER "opennlp TokenNameFinder en-ner-person.bin" find the person name seems to be work properly... But If I run the organization or location, it doesn't work for me. What doing I am wrong? – SST Aug 02 '13 at 03:51

2 Answers2

8

I can provide solution of

Solution of b.

Here is code :

    public class tikaOpenIntro {

    public String Tokens[];

    public static void main(String[] args) throws IOException, SAXException,
            TikaException {

        tikaOpenIntro toi = new tikaOpenIntro();


        String cnt;

        cnt="John is planning to specialize in Electrical Engineering in UC Berkley and pursue a career with IBM.";

                toi.tokenization(cnt);

        String names = toi.namefind(toi.Tokens);
        String org = toi.orgfind(toi.Tokens);

                System.out.println("person name is : "+names);
        System.out.println("organization name is: "+org);

    }
        public String namefind(String cnt[]) {
        InputStream is;
        TokenNameFinderModel tnf;
        NameFinderME nf;
        String sd = "";
        try {
            is = new FileInputStream(
                    "/home/rahul/opennlp/model/en-ner-person.bin");
            tnf = new TokenNameFinderModel(is);
            nf = new NameFinderME(tnf);

            Span sp[] = nf.find(cnt);

            String a[] = Span.spansToStrings(sp, cnt);
            StringBuilder fd = new StringBuilder();
            int l = a.length;

            for (int j = 0; j < l; j++) {
                fd = fd.append(a[j] + "\n");

            }
            sd = fd.toString();

        } catch (FileNotFoundException e) {

            e.printStackTrace();
        } catch (InvalidFormatException e) {

            e.printStackTrace();
        } catch (IOException e) {

            e.printStackTrace();
        }
        return sd;
    }

    public String orgfind(String cnt[]) {
        InputStream is;
        TokenNameFinderModel tnf;
        NameFinderME nf;
        String sd = "";
        try {
            is = new FileInputStream(
                    "/home/rahul/opennlp/model/en-ner-organization.bin");
            tnf = new TokenNameFinderModel(is);
            nf = new NameFinderME(tnf);
            Span sp[] = nf.find(cnt);
            String a[] = Span.spansToStrings(sp, cnt);
            StringBuilder fd = new StringBuilder();
            int l = a.length;

            for (int j = 0; j < l; j++) {
                fd = fd.append(a[j] + "\n");

            }

            sd = fd.toString();

        } catch (FileNotFoundException e) {

            e.printStackTrace();
        } catch (InvalidFormatException e) {

            e.printStackTrace();
        } catch (IOException e) {

            e.printStackTrace();
        }
        return sd;

    }


    public void tokenization(String tokens) {

        InputStream is;
        TokenizerModel tm;

        try {
            is = new FileInputStream("/home/rahul/opennlp/model/en-token.bin");
            tm = new TokenizerModel(is);
            Tokenizer tz = new TokenizerME(tm);
            Tokens = tz.tokenize(tokens);
            // System.out.println(Tokens[1]);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

}

and you want location also then import location model also that is available on openNLP source Forge. you can download and you can use them.

I am not sure about what will be probability of Name, Location, and Organization Extraction but almost it recognize all names,location,organization.

and if don't find openNLP sufficient then use Stanford Parser for Name Entity Recognization.

Rahul Kulhari
  • 1,115
  • 1
  • 15
  • 44
  • Thank you very much... It worked for me. Do you have any idea about question (a) ? a. How to find out if a sentence implies a certain action in the past, present or future. – SST Aug 05 '13 at 12:46
  • Stanford parser can help you to analyze sentence but i am not sure. i didn't try to analyze certain action but i can suggest you that first do tag sentence with pos tagger than using regular expression you can do it. – Rahul Kulhari Aug 05 '13 at 17:15
  • 1
    Is the opennlp have sentiment analysis using java?? Text classification process like the sentence is positive sentiment, negative sentiment, or if it's neutral. – SST Sep 13 '13 at 11:43
  • i didn't try this type of thing but you can try . Go to opennlp . I think they can tell you this type of thing. – Rahul Kulhari Sep 13 '13 at 11:55
0

Finding the literal tense of the sentence is not trivial, but doable in some cases. The OpenNLP parser will create a sentence structure from which you can attempt to extract the head verb, and a bit of morphological analysis will tell you whether the verb is present or past (in English), and a bit more mucking about for the model "will" will give you future tense in some circumstances. But it's not always that simple. For instance, in "Going to Paris drained my bank account", you have an embedded event (going to Paris) which happened in the past, but it's tricky to figure that out. And your future example ("I am planning...") requires some real-world understanding of what the word "plan" means, which is quite complicated. This sort of thing is a topic of ongoing research in natural language processing.

Sam Bayer
  • 415
  • 3
  • 10