0

I wish to produce METEOR scores for several Japanese strings. I have imported nltk, wordnet and omw but the results do not convince me it is working correctly.

from nltk.corpus import wordnet
from nltk.translate.meteor_score import single_meteor_score

nltk.download('wordnet')
nltk.download('omw')

reference = "チップは含まれていません。"
hypothesis = "チップは含まれていません。"

print(single_meteor_score(reference, hypothesis))

This outputs 0.5 but surely it should be much closer to 1.0 given the reference and hypothesis are identical?

Do I somehow need to specify which wordnet language I want to use in the call to single_meteor_score() for example:

single_meteor_score(reference, hypothesis, wordnet=wordnetJapanese.

phil
  • 1,938
  • 4
  • 23
  • 33

1 Answers1

0

Pending review by a qualified linguist, I appear to have found a solution. I found an open source tokenizer for Japanese. I pre-processed all of my reference and hypothesis strings to insert spaces between Japanese tokens and then run the nltk.single_meteor_score() over the files.

phil
  • 1,938
  • 4
  • 23
  • 33