5

I'm trying to write a simple text mining application to try to tell a German word's gender and plural form.

So, first of all, I need a big wordlist for training. I've searched around but could not find any list having either gender nor plural.

Abhranil Das
  • 5,702
  • 6
  • 35
  • 42
erickrf
  • 2,069
  • 5
  • 21
  • 44

2 Answers2

5

You could use data from the Durm German Lemmatizer, the Morphisto Lexikon, or the ispell dictionary for German. You might find some other resources by looking at me del.ico.us page with tag "german"

hmuelner
  • 8,093
  • 1
  • 28
  • 39
  • The Lemmatizer seems too much for what I want, while Morphisto Lexikon had an XML I could use. Nice! – erickrf Jan 09 '11 at 02:10
1

You could mine the data available in the English and German Wiktionaries. You can download the current data of each here and here.

But I warn you, wikitext is quite a fuzzy format and the English and German Wiktionary users came up with quite different ways to employ it for each project.

hippietrail
  • 15,848
  • 18
  • 99
  • 158
  • I didn't know those. While being great, they are waaaay too much for what I want, it's not even worth to try converting the files. – erickrf Jan 09 '11 at 02:17