Obtaining METEOR scores for Japanese text

Question

I wish to produce METEOR scores for several Japanese strings. I have imported nltk, wordnet and omw but the results do not convince me it is working correctly.

from nltk.corpus import wordnet
from nltk.translate.meteor_score import single_meteor_score

nltk.download('wordnet')
nltk.download('omw')

reference = "チップは含まれていません。"
hypothesis = "チップは含まれていません。"

print(single_meteor_score(reference, hypothesis))

This outputs 0.5 but surely it should be much closer to 1.0 given the reference and hypothesis are identical?

Do I somehow need to specify which wordnet language I want to use in the call to single_meteor_score() for example:

single_meteor_score(reference, hypothesis, wordnet=wordnetJapanese.

score 0 · Answer 1 · answered Jul 22 '21 at 15:40

Pending review by a qualified linguist, I appear to have found a solution. I found an open source tokenizer for Japanese. I pre-processed all of my reference and hypothesis strings to insert spaces between Japanese tokens and then run the nltk.single_meteor_score() over the files.

Obtaining METEOR scores for Japanese text

1 Answers1