Is there a service/library (free or paid) that takes a piece of text and return the language of it?
I need to go over a million blog posts and determine their languages.
Is there a service/library (free or paid) that takes a piece of text and return the language of it?
I need to go over a million blog posts and determine their languages.
I've heard good things about langid.py
.
Features from the README:
- Fast
- Pre-trained over a large number of languages (currently 97)
- Not sensitive to domain-specific features (e.g. HTML/XML markup)
- Single .py file with minimal dependencies
- Deployable as a web service