Fuzzy string matching for finding Synsets in German WordNet (GermaNet)

Question

For my bachelor thesis project, I am developing the Natural Language Understanding Unit for a Chatbot. Right now I am facing the following problem:

I have a word, let's say 'Auto', which is the German equivalent of car. Now the user might give 'autto' as an input, because he simply made a little typo (adding an extra 't') and usually in a chat interface, users don't follow upper-/lower-case rules, but type everything in lower-case.

For my NLU-algorithm, I need to find for every word the correct Synset in GermaNet (which is roughly equivalent to the WordNet for English). A Synset is a node in the wordnet, which abstracts all synonyms of a word sense into one node. Like, for example, in German 'Auto' (car) and 'Automobil' have the same meaning and are therefore representations of the same Synset.

The question now is, how can I find the correct Synset, if I don't have an orthographically correct version of the word? I mean, searching the whole wordnet is computationally to complex for every word.

I think N-Grams might offer a solution to the problem, but I am not aware of any algorithm.

As to what I use for implementation: Python3 with NLTK, Stanford CoreNLP, and pygermanet.

score 0 · Answer 1 · answered Jul 05 '17 at 19:52

0

If the input word is not in GermaNet, you could apply spelling correction first, eg with PyEnchant, and look up the corrected term.

answered Jul 05 '17 at 19:52

Lgiro

762
5
13

Fuzzy string matching for finding Synsets in German WordNet (GermaNet)

1 Answers1