4

I'm collecting data about some films in Russian and, using Wikipedia API, I can query data about required film in JSON format:

https://ru.wikipedia.org/w/api.php?format=json&action=query&prop=revisions&rvprop=content&titles=%s

where %s is a title of article.

One of required properties - IMDb ID. The problem here is that not all articles contain it in wiki markup (actually in film template), but rendered page always.

Russian version of film template says that IMDb ID is automatically taken from Wikidata (English version says about refusing any database in this infobox at all in favor of links section at the bottom).

Is there a way to request IMDb ID using Wikipedia API or Wikidata API?

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
cybersoft
  • 1,453
  • 13
  • 31
  • Did you try using OMDb API. It fetches you the IMDb ID For example: http://www.omdbapi.com/?t=Under+Electric+Clouds&y=&plot=full&r=json – Vikash B Jan 24 '17 at 10:55
  • @Vikash Yeah, currently I use OMDb API, but where to get id if there is no in article? – cybersoft Jan 24 '17 at 11:02

1 Answers1

4

You can get all Wikidata film items that have IMDb ID and link to ruwiki by Wikidata Query Service:

SELECT ?item ?IMDb_ID ?sitelink WHERE {
  ?item wdt:P31 wd:Q11424 .
  ?item wdt:P345 ?IMDb_ID .
  ?sitelink schema:about ?item ; schema:isPartOf <https://ru.wikipedia.org/> .
}

or

https://query.wikidata.org/bigdata/namespace/wdq/sparql?format=json&query=SELECT+?item+?IMDb_ID+?sitelink+WHERE+{?item+wdt:P31+wd:Q11424+.?item+wdt:P345+?IMDb_ID+.?sitelink+schema:about+?item+;+schema:isPartOf+%3Chttps://ru.wikipedia.org/%3E+.}

where each item has:

The result will include all Wikidata items, their IMDb IDs and linked with them ruwiki article names.

{
  "item" : {
    "value" : "http://www.wikidata.org/entity/Q203063"
  },
  "IMDb_ID" : {
    "value" : "tt0457308"
  },
  "sitelink" : {
    "value" : "https://ru.wikipedia.org/wiki/Приходи_пораньше"
  }
},
...

And here is an example how you can get the IMDb ID only for the Russian page Приходи пораньше.

Termininja
  • 6,620
  • 12
  • 48
  • 49
  • So slow... It took about 5-8 sec to find. But it works, thanks! – cybersoft Jan 28 '17 at 08:15
  • @cybersoft. I don't know what language you use, but did you try the variant always to use the query to get all IMDb IDs and to search in them by specific title? I'm wondering what will be the difference in the speed if you have to use the 'slow' query to get directly this ID by the same title. – Termininja Jan 28 '17 at 12:47
  • 1
    her is the results: download to memory ~5 seconds, search by URL ~0.7 ms. I'm using java 1.8, search is done using parallel Stream API (debug shows in my case 3 threads). It is all faster then send query to server and wait response... May be servers is slow, or it's network delay – cybersoft Jan 28 '17 at 13:29