3

I`m trying to build query with Wiki API that will return all internal links from specific article in id format. I have pageId of some article. For example for article "Android (Operational System)" id is 12610483. In my client side i need to work only with id and later obtain all information only by id. My goal is to find all internal links(ids of articles) from give article id.

Unfortunately, the only possible way i found is to obtain links that represented by titles of articles: http://en.wikipedia.org/w/api.php?action=parse&format=json&pageid=12610483&prop=links

Is there any other way to obtain ids of links as well and not only titles?

svick
  • 236,525
  • 50
  • 385
  • 514
Yonatan Levin
  • 342
  • 5
  • 17

2 Answers2

4

What you want to do is to use action=query&prop=links to get data from the pagelinks database table, instead of parsing the page text.

This will still give you only page titles (because a link can lead to a non-existent page, which implies no page id).

But you can fix that by using prop=links as a generator:

http://en.wikipedia.org/w/api.php?action=query&format=json&pageids=12610483&generator=links&gpllimit=max

If the article has many links (like the one you suggested), you will need to use paging (see the gplcontinue element).

svick
  • 236,525
  • 50
  • 385
  • 514
  • @svick. This method only counts each link once. I wonder if there's any method that keeps track of the number of each link in a page? For example, not only it tells that the given page links to ***Android-x86*** but also it tells that **Android-x86** is used **k** times in the article? Is it possible with the mediawiki API? Thanks. – chepukha Mar 12 '14 at 06:06
  • 1
    @chepukha That information is not stored in the `pagelinks` table (or anywhere else, at least not directly), so you won't find an API method for that. You will need to parse the page text for that, either as raw wikitext or as HTML. – svick Mar 12 '14 at 16:09
  • @svick. I see. Thanks for confirming. I have a related question [here](http://stackoverflow.com/questions/22359695/how-to-get-internal-link-from-latest-revision-of-a-wikipedia-page). Please let me know if you have any insight. – chepukha Mar 12 '14 at 17:47
-3

I think you need to use PHP Simple HTML DOM Parser

you cant find it here http://simplehtmldom.sourceforge.net/

  • 1
    1. There is absolutely no reason to use HTML parser here. 2. How is this going to help to get page ids? – svick Aug 26 '13 at 00:14