0

I'm trying to process text in German and Spanish languages. Working on English text is straight forward because of myriad NLP packages on this language. But it's not easy for other languages. I Found some packages for German text but I don't know which one is more accurate. Also, It's more difficult to find NLP package for Spanish text considering that there are some special characters in this language. Some steps that I need to do on the text are: Sentence Splitting, Tokenizing, Pos tagging and Stemming. In other words, I am looking for something that works on one or both of these two languages in Java.

Any information on this topic is appreciated..

SahelSoft
  • 615
  • 2
  • 9
  • 22

2 Answers2

1

I can recommend you Freeling, check its Freeling_online_demo, it includes Sentence Splitting, Tokenizing, Pos tagging and other functionalities for several language. I dont know how good it's for german but for analyze spanish is the best tool I know. I've just used Freeling via python+command line, but there are interfaces for java too, for example Freeling_jaVa_API.

Good luck!

Jason Angel
  • 2,233
  • 1
  • 14
  • 14
0

If you're willing to skip the Java requirement, Spacy is a very straightforward, cutting edge Python library which includes pretrained Spanish and German models

KonstantinosKokos
  • 3,369
  • 1
  • 11
  • 21
  • Thank you @KonstantinosKokos. But I wrote all the code in Java. Is there any way to use Spacy in Java or Is there any other package? – SahelSoft Mar 13 '18 at 08:34
  • I don't think there are spacy bindings for Java, or any viable alternatives to it outside python unfortunately. – KonstantinosKokos Mar 13 '18 at 08:44