6

Is there a service/library (free or paid) that takes a piece of text and return the language of it?

I need to go over a million blog posts and determine their languages.

J. Dorian
  • 109
  • 3

2 Answers2

6

I think this is the best out there!

https://code.google.com/p/language-detection/

Josh Penn
  • 61
  • 2
1

I've heard good things about langid.py.

Features from the README:

  • Fast
  • Pre-trained over a large number of languages (currently 97)
  • Not sensitive to domain-specific features (e.g. HTML/XML markup)
  • Single .py file with minimal dependencies
  • Deployable as a web service

https://github.com/saffsd/langid.py

nedned
  • 3,552
  • 8
  • 38
  • 41