0

We have reviews written in German and French which needs to be analysed and classified either as positive, neutral or negative based on the sentiment it reflects. We tried some tools which translate the reviews to English but the accuracy wasnt that great since the meaning is lost during translation. Any specific library that can be used in such a case? Any help is highly appreciated. Thanks in advance.

roht20
  • 3
  • 2
  • NLTK includes access to datasets in different languages besides English, you might have some luck there. For example, you can find reference to a German word stemmer here: http://www.nltk.org/api/nltk.stem.html. – jmercouris Jan 24 '19 at 11:00
  • [scikit-learn](https://scikit-learn.org/stable/) is language agnostic and has industrial quality algos for text processing and further classification (provided you have training dataset with labels "positive", "negative", or "neutral"). – Sergey Bushmanov Jan 24 '19 at 11:07
  • Thanks a lot for the valuable inputs. As suggested I will try using the German word stemmer and this might work well for our requirement. – roht20 Jan 25 '19 at 05:09
  • Use dependency parsing in a library (spaCy or StanfordNLP) & built a custom sentiment analyzer based on that. See my blog here : https://tech.goibibo.com/key-topics-extraction-and-contextual-sentiment-of-users-reviews-20e63c0fd7ca , or as a possible shortcut, translate to english & use ready to go analyzers like VaderSentiment – DhruvPathak Feb 06 '19 at 12:32

1 Answers1

1

The best Python library for sentiment analysis in German and French is Hugging Face Transformers in my opinion.

Today, you have very good Hugging Face Transformer based models, fine-tuned for sentiment analysis in many languages. In my opinion, the best one for German is https://huggingface.co/oliverguhr/german-sentiment-bert and the best one for French is https://huggingface.co/tblard/tf-allocine

If you can't or don't want to run your own NLP model, you can also use an API like this API I developed recently: NLP Cloud. I recently added the above German and French models for sentiment analysis.

Non-English NLP is still far from perfect. Most datasets are in English only but the ecosystem is gradually making progress.

Julien Salinas
  • 1,059
  • 1
  • 10
  • 23