0

We are trying to fetch italian data from wikipedia's api. We have multiple names and need to get the first 10 results.

e.g. we want to collect the data from the "Persone" section of this result: http://it.wikipedia.org/wiki/Francesco_(nome)

Right now the I'm trying this approach:

$kw = $name."_(nome)";
$url = "http://it.wikipedia.org/w/api.php?format=json&action=query&titles=".$kw."&prop=revisions&rvprop=content";

Other questions did not help much, i'm getting no output.

svick
  • 236,525
  • 50
  • 385
  • 514
Francesco Frapporti
  • 5,195
  • 4
  • 32
  • 39
  • 1
    are you using `file_get_contents()`??? Just setting a variable to a URL does not get that URL's contents. – kittycat Mar 19 '13 at 19:01
  • What do you mean, no output? Are you getting an error? Are you aware that [you have to set the `User-Agent` header](http://www.mediawiki.org/wiki/API:Main_page#Identifying_your_client)? – svick Mar 19 '13 at 22:58

1 Answers1

1

You can use php dom parser, Docs

By simple lookup in their dom here is the code for the first name:

require('dom/simple_html_dom.php');

$name = 'Francesco';
$kw = $name . '_(nome)';
$html = file_get_html('http://it.wikipedia.org/wiki/' . $kw);

$span = $html->getElementById('Persone');
$h2 = $span->parent();


$ul = $h2->next_sibling()->next_sibling()->next_sibling()->next_sibling();

$lis = $ul->find('li');

foreach($lis as $li){
    echo($li->plaintext . '<br />');
}
Adidi
  • 5,097
  • 4
  • 23
  • 30