I have been searching for some Java library which can give me information about "Frequency count" of the synset. I checked JWNL and JWI and they don't provide such information. Does anybody know other Java WordNet APIs?
-
Not quite a duplicate, but a very similar question here: http://stackoverflow.com/q/20936502/841830 – Darren Cook Jan 23 '14 at 13:45
3 Answers
I believe this can be done with JWI as well, but it's not very intuitive.
Let's start with a lemmatized word. If you have a word that is not lemmatized, you should use a lemmatizer before searching for the word using JWI.
String lemma = ... // the lemmatized word
IRAMDictionary dict = new RAMDictionary(WN_DIR,ILoadPolicy.IMMEDIATE_LOAD);
IIndexWord indexWord = dict.getIndexWord(lemma, POS.NOUN); // or verbs, etc.
List<IWordID> wrdIDs = indexWord.getWordIDs();
for (IWordID id : wrdIDs) {
IWord word = dict.getWord(id);
int count = dict.getSenseEntry(word.getSenseKey()).getTagCount();
System.out.println("Synset: " + word.getSynset().getGloss());
System.out.println("Frequency: " + count);
}
This may look overly complicated, but note that we started with a word for this little code snippet, not a synset!
In JWI, each IWord
uniquely identifies a synset (although a synset will typically have more than word in it), so the approach to computing the frequency of each word sense is quite counter-intuitive (at least to me, it was).
The count is given by the getTagCount()
method, for which the documentation states
Returns the tag count for the sense entry. A tag count is a non-negative integer that represents the number of times the sense is tagged in various semantic concordance texts. A count of 0 indicates that the sense has not been semantically tagged.
Keep in mind, though, that the sense counts in WordNet are horribly outdated (as far as I can recall, they have not been updated since 2001).

- 8,216
- 1
- 43
- 92
each Synset has a frequency indicator, based on corpora.
JAWS - http://lyle.smu.edu/~tspell/jaws offers Synset#getTagCount
Not sure about JWNL and JWI, but look for synset apis in these libraries.
Note: (personal opinion)do not trust this frequency indicator, it is seriously misleading.

- 2,293
- 3
- 24
- 44
-
-
extJWNL does support tag count http://extjwnl.sourceforge.net/javadocs/net/sf/extjwnl/data/Word.html#getUseCount%28%29 – Amit G Jan 23 '14 at 14:19
extjwnl has a function of Word, getUseCount(), which returns what you want:
Here: http://extjwnl.sourceforge.net/javadocs/index.html
For example:
IndexWord word = dictionary.lookupIndexWord(POS.NOUN, exampleWord);
List<Synset> synset=word.getSenses();
int nums = word.sortSenses();
// for each sense of the word
for ( Synset syn : synset) {
// get the synonyms of the sense
PointerTargetTree s = PointerUtils.getSynonymTree(syn, 2 /*depth*/);
List<PointerTargetNodeList> l = s.toList();
for (PointerTargetNodeList nl : l) {
for (PointerTargetNode n : nl) {
Synset ns = n.getSynset();
if (ns!=null) {
List<Word> ws = ns.getWords();
for (Word ww : ws) {
// ww.getUseCount() is the frequency of occurance as reported by wordnet engine
println(ww.getLemma(), ww.getUseCount());
}
}
}
}
}

- 11
- 1