0

I'm using PHP's cURL to get some tag information from various URLs. My requests work some of the time, but other times they don't work at all. Is there some reason why my code doesn't work? (Note that I'm also using simple_html_dom):

$webpage = 'http://www.some_url.com';

$curl = curl_init(); 
curl_setopt($curl, CURLOPT_URL, $webpage);  
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);  
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($curl, CURLOPT_FAILONERROR, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_FRESH_CONNECT, true);

$str = curl_exec($curl);  
curl_close($curl);  

$html = '';

if( !empty($str) )
{
    require_once( 'simple_html_dom.php');

    $html= str_get_html($str);
    $element = $html->find('h1', 0);
    $webpage_name = strip_tags($element);

    $item = $html->find('meta[name=description]', 0);
    $description =  $item->content;
}

// save $description to database
// save $webpage_name to database

For about half the URLs I try, the description and webpage_name are stored in my database, but for the other half, they are not stored, and the script just stalls. That is, when the user submits a URL to my website a progress bar is presented while the URL is uploading to my site. Then, the progress bar disappears and the URL is displayed on my webpage for the user to see once the URL submission is complete. For troublesome URLs, the progress bar goes away, but the link doesn't appear on the page and nothing is stored to my database. What am I missing?

Michael Petrotta
  • 59,888
  • 27
  • 145
  • 179
Ajay Mohite
  • 119
  • 3
  • 13
  • 1
    Do you have shell access to whatever machin is running this script? try telnetting to these 'bad' urls' port 80 and see if that times out as well. If it does, then it's not curl - there's a firewall somewhere blocking the hit. – Marc B Aug 05 '12 at 04:48
  • 1
    Could be a server-side or network problem. Sometimes the Internet is slow... – Thilo Aug 05 '12 at 04:48
  • One other bit of information. Sometimes, but not always, I can get some of the 'bad urls' to work on my MAMP development server, but not on my Linux production server. So, I don't think it is the urls that are the problem. Also, extending the timeout period for curl doesn't have any effect. – Ajay Mohite Aug 05 '12 at 04:53
  • 1
    Had you checked the status code returned by curl?? get this info using `$info = curl_getinfo($ch)` and store `$info` in some log file. Let the site run for some time and then check the log file for transfer details. – Uday Sawant Aug 05 '12 at 04:56
  • Two things here: First, you can try removing the time out or increasing it to a higher value. It just takes time to make a connection some times. Second, you can try use `file_get_html($webpage);` directly in simple html dom if cURL is not a necessity. – Prasanth Aug 05 '12 at 05:01
  • What is `$ch` in `$info = curl_getinfo($ch)`? – Ajay Mohite Aug 05 '12 at 05:09
  • The thing is, sometimes I can get a URL to work on my development server, and it runs really fast. Then, on my Linux production server it doesn't work at all. – Ajay Mohite Aug 05 '12 at 05:11

3 Answers3

0

Try using curl_getinfo before your curl_close call. In addition to a ton of other useful info, it'll give you the HTTP status code, which will let you know what's happening with your requests. That should give you the answers you need... just make sure to remove that CURLOPT_FAILONERROR setting (or set it to false).

guillermoandrae
  • 600
  • 5
  • 10
0

My error log is saying "Call to undefined function mb_detect_encoding()". This function requires that the mbstring extension is enabled (it is needed by simple_html_dom.php). MAMP does have this installed by default, and that is why it works on my development server, but not on my production server. I have put in a request to have mbstring enabled on my Linux production server, so I'll let everyone know if this is in fact what the problem was. I have seen several posts online with people having the same problem, so I hope this will help a lot of people.

Ajay Mohite
  • 119
  • 3
  • 13
  • Yep, that was it. I have noticed lots of people that appear to have the same problem. This caused me a major headache. Just make sure the `mbstring` extension is enabled on PHP. – Ajay Mohite Aug 06 '12 at 00:38
0

Your question was a long time ago, but here is my solution. I had the same problem, curl working local on my Windows machine but not on Linux. Just some urls, not all of them. I was using CURLOPT_SSL_VERIFYPEER set to false, then I added CURLOPT_SSL_VERIFYHOST as well. At least in my case, urls not working was due to SSL certificates not well defined for the domain I was trying to access. I do not know why it was working on Windows even without this parameter, but it worked for me.

Flavio
  • 41
  • 4