I have a small python script using nltk.wordnet and lemma frequencies. However the results seem very odd.
Even for the canonical example word 'dog' I get 0
for most synsets.
In chinese, almost everything seems to register as zero:
dog
{'dog': 2, 'domestic_dog': 0, 'Canis_familiaris': 0, 'frump': 0, 'cad': 0, 'bounder': 0, 'blackguard': 0, 'hound': 1, 'heel': 0, 'frank': 0, 'frankfurter': 0, 'hotdog': 0, 'hot_dog': 0, 'wiener': 0, 'wienerwurst': 0, 'weenie': 0, 'pawl': 0, 'detent': 0, 'click': 0, 'andiron': 0, 'firedog': 0, 'dog-iron': 0, 'chase': 10, 'chase_after': 0, 'trail': 3, 'tail': 0, 'tag': 0, 'give_chase': 0, 'go_after': 7, 'track': 1}
狗
{'犬': 0, '狗': 0}
other chinese words:
{'大学': 0, '学院': 0}
{'列为': 0, '比': 0, '比较': 0, '相比': 0}
{'你好': 0, '再见': 0, '喂': 0, '欢迎': 0}
{}
{ '交付': 0,
'交托': 0,
'交给': 0,
'付出': 0,
'供给': 0,
'信托': 0,
'托运': 0,
'移交': 0,
'给': 0,
'给予': 0,
'运送': 0,
'递': 0}
{'腹泻': 0}
I would have thought it's my code, but I do occasionally get a non-zero number back.
from nltk.corpus import wordnet as wn
def check(word, lang='cmn'):
"""get synonyms and freq"""
syns = wn.synsets(word, lang=lang)
counts = {}
for syn in syns:
for lem in syn.lemmas(lang=lang):
name = lem.name()
freq = lem.count()
counts[name] = freq
return counts