1

I want to code a php function which connects to a wikipedia url and get the content of the wikipedia article. I use cURL with php. I refer to this blog .

The problem is: the function does not see the url's content and returns error.

This is my code:

<?php 
$wikipediaURL = 'http://fr.wikipedia.org/wiki/Megadeth';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $wikipediaURL);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Le blog de Samy Dindane (www.dinduks.com)');
$resultat = curl_exec ($ch);
curl_close($ch);
$wikipediaPage = new DOMDocument();
$wikipediaPage->loadHTML($resultat);
foreach($wikipediaPage->getElementsByTagName('div') as $div){
if($div->getAttribute('id') == "bodyContent"){
    $description = '<p>' . $div->getElementsByTagName('p')->item(0)->nodeValue. '</p>';
    $description = preg_replace('/\[[0-9]*\][,]|\[[0-9]*\]/', '', $description);
    echo $description;    }}
?>

This is the error message:

Warning: DOMDocument::loadHTML(): Empty string supplied as input in C:\wamp\www\Project1\wiki5.php on line 12

I use other code samples with the same function, and it does not work only with wikipedia url.

Any help please! Thanks

gofr1
  • 15,741
  • 11
  • 42
  • 52
Adem
  • 113
  • 1
  • 10
  • You're not checking if the cUrl call was actually successful. Check [my answer to another question](http://stackoverflow.com/questions/8227909/curl-exec-always-returns-false/13311209#13311209) to find out how to diagnose the call. – Linus Kleen Mar 08 '16 at 17:51
  • 1
    You aren't using the `wikipedia` api. I presume they block blank requests. https://www.mediawiki.org/wiki/API:Main_page – chris85 Mar 08 '16 at 17:51
  • Wouldn't file_get_contents works ? $wikipediaURL = 'http://fr.wikipedia.org/wiki/Megadeth'; $tmp = file_get_contents($wikipediaURL); echo $tmp; – SamyQc Mar 08 '16 at 18:50
  • I do not want to display the contents directly, I want to recover the contents and make a for another treatment later. So I must convert it to DOM (.such it is written in the code) first, i can't do this on a file_get_content – Adem Mar 08 '16 at 19:52
  • If you are asking help about a cURL error, please provide that error. Not the DOMDocument error caused by not handling the original error. – Tgr Mar 09 '16 at 09:11
  • @Tgr This is not a DomDocument error, itis a Curl error. but DOmDocument set off an error because the variable ($resuktat) passed as parameter is empty. So it is a Curl error. – Adem Mar 09 '16 at 14:56
  • So you suspect there is a cURL error, you share a DomDocument error message in your question, and expect others will be able to figure out what your problem is. Not super effective. If you don't know how to get the error message from a curl call, you should be asking that. (Or, really, google it.) – Tgr Mar 10 '16 at 07:57

1 Answers1

0

Simply add CURLOPT_FOLLOWLOCATION option, and your code will works:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $wikipediaURL);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, True);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);                                     # <----
curl_setopt($ch, CURLOPT_USERAGENT, 'Le blog de Samy Dindane (www.dinduks.com)');
$resultat = curl_exec ($ch);
curl_close($ch);
fusion3k
  • 11,568
  • 4
  • 25
  • 47
  • i add the CURLOPT_FOLLOWLOCATION and it gives me the same result Empty string supplied as input. – Adem Mar 08 '16 at 18:04
  • You use elsewhere cURL successfully? I have tested it and it works for me – fusion3k Mar 08 '16 at 18:08
  • for the records: try it with `echo file_get_contents( $wikipediaURL );` – fusion3k Mar 08 '16 at 18:13
  • i use the same code with other url (not wikipedia url) and it works. so the problem is for wikipedia URL. – Adem Mar 08 '16 at 18:42
  • i try echo file_get_contents( $wikipediaURL ); and it dispalays the page content. i don't understand what's the problem and how can i fix it! – Adem Mar 08 '16 at 18:55