0

I'm working on a project using WordNet and JWI 2.4.0. Currently, I'm putting a lot of words within the included stemmer, it seems to work, until I asked for "order". The stemmer answers me that "order", "orde", and "ord", are the possible stems of "order". I'm not a native english speaker, but... I never saw the word "ord" in my life... and when I asked the WordNet dictionary for this definition : obviously there is nothing. (in BabelNet online, I found that it is a Nebraska's town !)

Well, why is there this strange stem ? How can I filter the stems that are not present in the WordNet dictionary ? (because when I re-use the stemmed words, "orde" is making the program crash)

Thank you !

ANSWER : I didn't understood well what was a stem. So, this question has no sense.

Here is some code to test :

package JWIExplorer;

import java.io.File;
import java.io.IOException;
import java.net.URL;
import java.util.Arrays;
import java.util.Date;
import java.util.Iterator;
import java.util.List;

import edu.mit.jwi.Dictionary;
import edu.mit.jwi.IDictionary;
import edu.mit.jwi.morph.WordnetStemmer;

public class TestJWI
{

    public static void main(String[] args) throws IOException
    {
        List<String> WordList_Research = Arrays.asList("dog", "cat", "mouse");
        List<String> WordList_Research2 = Arrays.asList("order");

        String path = "./" + File.separator + "dict";
        URL url;

        url = new URL("file", null, path);

        System.out.println("BEGIN : " + new Date());

        for (Iterator<String> iterstr = WordList_Research2.iterator(); iterstr.hasNext();)
        {
            String str = iterstr.next();

            TestStem(url, str);
        }

        System.out.println("END : " + new Date());
    }

    public static void TestStem(URL url, String ResearchedWord) throws IOException
    {
        // construct the dictionary object and open it
        IDictionary dict = new Dictionary(url);
        dict.open();

        // First, let's check for the stem word
        WordnetStemmer Stemmer = new WordnetStemmer(dict);
        List<String> StemmedWords;

        // null for all words, POS.NOUN for nouns
        StemmedWords = Stemmer.findStems(ResearchedWord, null);
        if (StemmedWords.isEmpty())
            return;

        for (Iterator<String> iterstr = StemmedWords.iterator(); iterstr.hasNext();)
        {
            String str = iterstr.next();

            System.out.println("Local stemmed iteration on : " + str);
        }
    }

}
Metalman
  • 73
  • 9

1 Answers1

1

Stems do not necessarily need to be words by themselves. "Order" and "Ordinal" share the stem "Ord".

The fundamental problem here is that stems are related to spelling, but language evolution and spelling are only weakly related (especially in English). As a programmer, we'd much rather describe a stem as a regex, e.g. ^ord[ie]. This captures that it's not the stem of "ordained"

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • Outch, so my mistake is that I didn't understood the meaning of what is a stem ? Thank you then. – Metalman Oct 06 '17 at 12:41
  • 1
    @Metalman: Linguistics is not as exact a science as comp.sci or math. "Stem" does not have an absolutely precise definition, but you indeed seem to have thought that stems must be words. That's not true, at least not in WordNet's definition of stems. – MSalters Oct 06 '17 at 12:47
  • Exactly. Thank you for answering it ! – Metalman Oct 06 '17 at 12:49
  • Ah, I have another question linked, do the lemma are what I was talking about ? By reading the documentation of JWI, I found something really close to what I was thinking of (the minimal existing word). [link to getLemma()](http://projects.csail.mit.edu/jwi/api/edu/mit/jwi/item/IWordID.html#getLemma()) – Metalman Oct 06 '17 at 13:22
  • 1
    @Metalman: I guess that depends on what you mean by "minimal". If you mean that in the Java `String.length` sense, it might not be minimal. Also, the lemma might not be an actual prefix. But I _do_ expect `getLemma` to return "order" given "order", and it should always return a word. More generally, I expect it to be idempotent. `getLemma(getLemma(X))` should return the same as `getLemma(X)` for any X. – MSalters Oct 06 '17 at 13:32