I'm trying to write a simple text mining application to try to tell a German word's gender and plural form.
So, first of all, I need a big wordlist for training. I've searched around but could not find any list having either gender nor plural.
I'm trying to write a simple text mining application to try to tell a German word's gender and plural form.
So, first of all, I need a big wordlist for training. I've searched around but could not find any list having either gender nor plural.
You could use data from the Durm German Lemmatizer, the Morphisto Lexikon, or the ispell dictionary for German. You might find some other resources by looking at me del.ico.us page with tag "german"
You could mine the data available in the English and German Wiktionaries. You can download the current data of each here and here.
But I warn you, wikitext is quite a fuzzy format and the English and German Wiktionary users came up with quite different ways to employ it for each project.