2

I have searched but not found what I want, which is:

the best and most efficient to extract all Italian words, etymologies and parts of speech... including plural forms of words (amico, amichi) from wiktionary. I would like to put it into either a CSV (maybe too larg though) or a MySQL db as pure text (not blobs).

I want essential a record for each Italian word in English.

mwdumper keeps crashing too.

any advice would be welcome!

esponapule
  • 51
  • 6

1 Answers1

2

I created a small Java program which extracts part of speech (verb, nound, adjective, adn so on) from the en.wiktionary XML dump, here, it uses TSV but can be adapted easily.

Jacopofar
  • 3,407
  • 2
  • 19
  • 29
  • Jackopo,I tried to compile it and got this error:POSfromDump.java:20: error: class GeneraDatabasePOS is public, should be declared in a file named GeneraDatabasePOS.java public class GeneraDatabasePOS { (sorry i have never compiled java before) – esponapule May 14 '13 at 23:40
  • You have to save it in a file called as the class, that is, GeneraDatabasePOS.java. You'll also have to change the lines of code containing the file paths. – Jacopofar May 15 '13 at 07:43
  • Also you'll have to put it into a folder called "generazione" and run it with `java generazione.GeneraDatabasePOS` – Jacopofar May 15 '13 at 07:49
  • You can see an updated version [here](https://github.com/jacopofar/wikidump-tools) or directly download the file "POS_list_IT_mar_2013.txt", it's 8.7 MB and contains 486481 terms. – Jacopofar May 15 '13 at 08:14