2

Hii.. Can anybody help me to find an algorithm in Java code to find synonyms of a search word based on the context and I want to implement the algorithm with WordNet database.

For example, "I am running a Java program". From the context, I want to find the synonyms for the word "running", but the synonyms must be suitable according to a context.

user330394
  • 21
  • 1
  • 2
  • Here is Perl implementation of the algorithm http://senserelate.sourceforge.net/ you may use it from Java code, but it requires some configuration work. – VVV Jul 06 '10 at 11:07

3 Answers3

9

Let me illustrate a possible approach:

  1. Let your sentence be A B C
  2. Let each word have synsets i.e. {A:(a1, a2, a3), B:(b1), C:(c1, c2)}
  3. Now form possible synset sets: (a1, b1, c1), (a1, b1, c2), (a2, b1, c1) ... (a3, b1, c2)
  4. Define function F(a, b, c) which returns the distance (score) between (a, b, c).
  5. Call F on each synset set.
  6. Pick the set with the maximum score.

For starters, the function F can just return the product of the inverse of the number of nodes between the two nodes:

Maximize(Product[i=0 to len(sentence); j=0 to len(sentence)] (1/D(node_i, node_j)))

Later on, you can increase its complexity.

Prav
  • 381
  • 1
  • 2
  • 12
  • Umm, no. Distance = the count of nodes between the two words. Wordnet is like a connected graph with each synset as a node. The edges are the relationships like Hypernyms, Hyponyms, etc – Prav May 11 '10 at 10:39
  • For Lesk algorithm, D = count of words that are in multiple definitions from the synsets – Stephen Denne Jun 10 '10 at 00:17
2

This is the perfect document for your problem. The acc of the algorithm is not high but I think it will be enough .

On this link you can find a Java API for WordNet Searching (JAWS).

izilotti
  • 4,757
  • 1
  • 48
  • 55
Radu
  • 21
  • 1
1

Hi i got to have a look at this page when i was searching for lesk algorithm implementations . I think it comes as a part of the JAWS package . i havent used it yet , but i guess this will help

CTsiddharth
  • 907
  • 12
  • 21