1

I am interested in Natural Language processing. I am wondering if there is a good known algorithm that in a text one can determine first and last name as one entity.

For example If we have this:

Last week John Wayne went to Europe.

I want to have a tokenizer that gives: "Last", "Week", John Wayne", "went", "to", "Europe".

Any help is appreciated.

TJ1
  • 7,578
  • 19
  • 76
  • 119

2 Answers2

4

This is an essential part of named entry recognition and most NER algorithms do it out of the box (most of the time). For example, I ran your sentence through the Stanford NER system's web interface and I got:

Last week <PERSON>John Wayne</PERSON> went to <LOCATION>Europe</LOCATION>.

Depending on what algorithm you use, the output may be formatted differently. The most common format is IOB.

mbatchkarov
  • 15,487
  • 9
  • 60
  • 79
  • Thanks for the answer. Can you please suggest a few of the NER algorithms that can do this? I am specially interested in none English languages, so I like to know algorithms rather than tools. – TJ1 Jun 11 '14 at 12:55
  • Do you want to know how NER is done in general or do you want a tool that can do NER for you? – mbatchkarov Jun 11 '14 at 13:12
  • 1
    Stanford's NER uses CRF. – Blacksad Jun 11 '14 at 13:24
  • I am interested in knowing how NER is done especially for none English languages. – TJ1 Jun 19 '14 at 14:01
2

If the characters in your text are famous people you can do this:

  • Run Illinois Wikifier on your text : for example run it on your example : http://cogcomp.cs.illinois.edu/demo/wikify/?id=25

  • Combine all the words that are linked to the same webpage by the Wikifier; for example in your example the output becomes like this: "Last week John_Wayne went to Europe." You can also save it where the combinations is done.

Now you can do anything with your text, like giving it to a tokenizer!

Daniel
  • 5,839
  • 9
  • 46
  • 85
  • Thanks for the answer. This is a good tool, however I am looking for an algorithm to do so. Doing this in English is relatively easier as first and last names both start with capital letters. I am more intrested in algorithms that can be used for other languages. – TJ1 Jun 11 '14 at 13:04