1

I'm accessing the wikipedia api like so: http://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=xml&exsentences=2&exlimit=10&exintro=&explaintext=&redirects=&generator=search&gsrsearch=France&gsrlimit=10

This gives me xml, which I'm having trouble accessing. I've tried the following, but I'm getting nothing in return:

    ini_set("user_agent", 'myemail');   
$xml=simplexml_load_file('http://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=xml&exsentences=2&exlimit=10&exintro=&explaintext=&redirects=&generator=search&gsrsearch=France&gsrlimit=10');        
header('Content-Type: text/xml'); 
echo $xml->api->query->pages->page[0]->extract;

Can anyone tell me what I'm doing wrong? Please take into account that I'm an XML newbie here...

Phil
  • 1,719
  • 6
  • 21
  • 36

2 Answers2

2

try

$xml=simplexml_load_file('http://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=xml&exsentences=2&exlimit=10&exintro=&explaintext=&redirects=&generator=search&gsrsearch=France&gsrlimit=10');
echo $xml->query->pages->page[0]->extract;

output:- In the Second World War, the Battle of France, also known as the Fall of France, was the successful German invasion of France and the Low Countries, beginning on 10 May 1940, defeating primarily French forces. The battle consisted of two main operations.

Rakesh Sharma
  • 13,680
  • 5
  • 37
  • 44
  • That's giving me the following message: "This page contains the following errors: error on line 1 at column 1: Document is empty Below is a rendering of the page up to the first error." – Phil May 23 '14 at 11:36
  • It's really strange. When I write "echo $xml->asXML();" I get the whole xml. But when I switch to "echo $xml->query->pages->page[0]->extract;" I now get the following error: "This page contains the following errors: error on line 1 at column 1: Document is empty Below is a rendering of the page up to the first error." – Phil May 23 '14 at 11:42
  • try to remove header and other stuff and try it simple blank php page – Rakesh Sharma May 23 '14 at 11:43
0

I would use the json api instead: change format=xml to format=json in the url string, then do:

$json = file_get_contents('http://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&exsentences=2&exlimit=10&exintro=&explaintext=&redirects=&generator=search&gsrsearch=France&gsrlimit=10');

$data = json_decode($json, true);

foreach ($data['query']['pages'] as $page) {
    echo '<p>' . $page['extract'] . '</p>';
}

with json_decode you get a naitive php array which i find much more intuitive than simplexml, but thats just preference.

Steve
  • 20,703
  • 5
  • 41
  • 67