2

I have a small python script using nltk.wordnet and lemma frequencies. However the results seem very odd. Even for the canonical example word 'dog' I get 0 for most synsets. In chinese, almost everything seems to register as zero:

dog
 {'dog': 2, 'domestic_dog': 0, 'Canis_familiaris': 0, 'frump': 0, 'cad': 0, 'bounder': 0, 'blackguard': 0, 'hound': 1, 'heel': 0, 'frank': 0, 'frankfurter': 0, 'hotdog': 0, 'hot_dog': 0, 'wiener': 0, 'wienerwurst': 0, 'weenie': 0, 'pawl': 0, 'detent': 0, 'click': 0, 'andiron': 0, 'firedog': 0, 'dog-iron': 0, 'chase': 10, 'chase_after': 0, 'trail': 3, 'tail': 0, 'tag': 0, 'give_chase': 0, 'go_after': 7, 'track': 1}

狗 
{'犬': 0, '狗': 0}

other chinese words:

{'大学': 0, '学院': 0}
{'列为': 0, '比': 0, '比较': 0, '相比': 0}
{'你好': 0, '再见': 0, '喂': 0, '欢迎': 0}
{}
{   '交付': 0,
    '交托': 0,
    '交给': 0,
    '付出': 0,
    '供给': 0,
    '信托': 0,
    '托运': 0,
    '移交': 0,
    '给': 0,
    '给予': 0,
    '运送': 0,
    '递': 0}
{'腹泻': 0}

I would have thought it's my code, but I do occasionally get a non-zero number back.

from nltk.corpus import wordnet as wn

def check(word, lang='cmn'):
    """get synonyms and freq"""
    syns = wn.synsets(word, lang=lang)
    counts = {}
    for syn in syns:
        for lem in syn.lemmas(lang=lang):
            name = lem.name()
            freq = lem.count()
            counts[name] = freq
    return counts
dcsan
  • 11,333
  • 15
  • 77
  • 118

1 Answers1

0

Please see the answers to this question. The frequency counts in Wordnet are not useful, not even for English. You can find frequency counts for a lot of languages here.

Suzana
  • 4,251
  • 2
  • 28
  • 52