0

Intro to my problem: users can search for terms and RitaWordNet provides a method called getSenseIds() to get the related senses. By now I am using WS4J (WordNet Similarity for Java, http://code.google.com/p/ws4j/) that has different algorithms to define distance. A search for "user" has this result:

  • user
  • exploiter
  • drug user

http://wordnetweb.princeton.edu/perl/webwn?s=user&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=0

The Lin-distance is measured by comparing two terms in WS4J (with targetWord I assume?):

  • Similarity between: user and: user = 1.7976931348623157E308
  • Similarity between: user and: exploiter = 0.1976958835785797

I would like to return to the end-user a suggestion that the "user" sense is the most relevant/correct answer, but the problem is that this depends on the rest of the sentence.

Example: "The old man was a regular user of public transport", "The young man became became a drug user while studying NLP..".

I assume that the senserelate project has something included that I'm missing. This thread also got picked up during my search: word disambiguation algorithm (Lesk algorithm)

Hopefully someone got my question :)

Community
  • 1
  • 1
cloms
  • 89
  • 2
  • 10

1 Answers1

2

You might want to try WordNet::SenseRelate::AllWords - there's an online demo at http://maraca.d.umn.edu

Ted Pedersen
  • 171
  • 2
  • Thank you Ted, nice to get answers from the actual inventor of the program, in this and other related posts! :) I have not seen this demo before, it's really interesting. I will try to get the installation-issue out of the way and then look deeper into this! I will have to find out how the lesk-algorithm come up with these senses as the most correct ones. – cloms Nov 19 '13 at 14:01