2

Is there a meta tag that gives the original language of a webpage, or some library I could use to detect it? For example:

detect_language('https://play.google.com/store/movies/details?id=lzLX-xKfQhE')
==> DE (German)

detect_language('https://itunes.apple.com/jp/movie/gon-garu-zi-mu-ban/id944521490?l=en')
==> JP (Japanese)
David542
  • 104,438
  • 178
  • 489
  • 842

1 Answers1

5

The language of both pages is, arguably, English! Much of the content on the page is in other languages, but the structure of the page (labels, links, etc) is English, and the meta tags on each page agree with this assessment.

From the Google Play page:

<html lang="en_US">
      ^^^^^^^^^^^^

From the iTunes Store page:

<html prefix="og: http://ogp.me/ns#" xmlns="http://www.apple.com/itms/" lang="en">
                                                                        ^^^^^^^^^

There exist some APIs that can attempt to perform language detection. One such (commercial) example is Google Translate's Detect Language call. It's a bit of a tossup what such an API would make of these pages, though; there's a strong argument that they're both English.

  • So I guess a better question would be how could I detect the language of the title + synopsis on the page? – David542 Mar 14 '15 at 00:04
  • Extract the parts that you care about and pass them to a language detection API? You'd have to have some way to determine which parts those are, though, and that's going to itself depend on the specific sites you're trying to do this to. –  Mar 14 '15 at 00:06