0

I saw most language codes provided in the polyglot examples were two characters: 'en', 'es', 'zh', etc. However, I know they can also contain regions within each language, one example had 'zh_hant' such as here: http://polyglot.readthedocs.io/en/latest/Detection.html. I'm not sure if region detection is even an option or whether it is the default.

I can't seem to find a table specific to polyglot and what they may use, but I needed to know the max length they could be. Apparently greater than two.

Bonus:
1. Is there a way to specify if region is included in the polyglot detection?
2. Why couldn't I find the table?

user58446
  • 269
  • 1
  • 3
  • 17
  • Are you just looking for *ISO 639* language codes ? Including the variants, such as *639-3* ? I'm not familiar with Polyglot but what you show us looks remarkably like the ISO standard codes. – High Performance Mark Jul 28 '18 at 07:53
  • Yes, they are using 639-1 codes in most cases it seems 'en', 'es', 'zh', etc. But in the example I provided they throw in what appears to be a 639-3 code... google search for 'iso 639-3 zh-hant' returns some results as if it is recognized, but no definitive table. See https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes and https://en.wikipedia.org/wiki/ISO_639-3. – user58446 Jul 28 '18 at 08:04
  • The detector object has both 'code' and 'locale' attributes, but they seem to return the same value for my examples. A search for 'python polyglot detector locale' returns nothing. – user58446 Jul 28 '18 at 08:22

0 Answers0