0

Tried to crawl this ny times article with Curl: Article

the function

function get_content_curl($url) {
   $ch = curl_init();

   curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
   curl_setopt($ch, CURLOPT_HEADER, 0);
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
   curl_setopt($ch, CURLOPT_URL, $url);
   curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);       
   curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
   $data = curl_exec($ch);
   curl_close($ch);

   return $data;
}

He fails to crawl the article redirecting to a login page and crawl the login page but not the article. why?

how to prevent redirecting? i also tried CURLOPT_FOLLOWLOCATION, false but dosent work. how to fix this?

Answer of my own question:

Added those 2 lines for creating and reading the cookies and it works.

 curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
 curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
mwweb
  • 7,625
  • 4
  • 19
  • 24

0 Answers0