0

I am making some tests with ws4j library. In particular I want to calculate similarity between two test words "university" and "teaching". When I apply stemming, it gives me 0 similarity... When I do not apply stemming, the result is higher than 0. On the other hand, when I check the similarity between "genders" and "sex", then stemming has a reverse impact: when I use it, it gives a positive similarity. Otherwise the similarity is equal to 0.

Why does it happen and which would be a more generic approach that would give similar results for both examples?

public class TestWs4j
{    
    private static ILexicalDatabase db = new NictWordNet();
    private static RelatednessCalculator[] rcs = {
            new WuPalmer(db), // new HirstStOnge(db), new LeacockChodorow(db), new Lesk(db),
            new JiangConrath(db), new Path(db) // new Resnik(db), new Lin(db),
    };

    private static void run( String word1, String word2 ) {
        WS4JConfiguration.getInstance().setMFS(true);
        for ( RelatednessCalculator rc : rcs ) {
            double s = rc.calcRelatednessOfWords(word1, word2);
            System.out.println( rc.getClass().getName()+"\t"+s );
        }
    }
    public static void main(String[] args) {
        long t0 = System.currentTimeMillis();
        PorterStemmer stemmer = new PorterStemmer();
//        String w1 = stemmer.stemWord("university");
//        String w2 = stemmer.stemWord("teaching");
//        run(w1,w2);
        run("university","teaching");
        long t1 = System.currentTimeMillis();
        System.out.println( "Done in "+(t1-t0)+" msec." );
    }
}
Klue
  • 1,317
  • 5
  • 22
  • 43
  • Porter stemmers just chop off prefixes and suffixes to get to the base word, university -> universit Teaching -> teach runner -> run ...etc in many cases the stemmed word is not a valid word. does your database of related words index the stemmed word or the proper word – Arthur Jun 08 '16 at 09:22
  • @Arthur: thanks. I use NICT WordNet db and jawjaw parser. I guess that wordnet index both the stemmed and proper words.. Isn't it? – Klue Jun 08 '16 at 09:48

0 Answers0