cant crawl url with curl

Asked Jul 15 '14 at 19:23

Active Jul 22 '14 at 10:31

Viewed 125 times

Tried to crawl this ny times article with Curl: Article

the function

function get_content_curl($url) {
   $ch = curl_init();

   curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
   curl_setopt($ch, CURLOPT_HEADER, 0);
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
   curl_setopt($ch, CURLOPT_URL, $url);
   curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);       
   curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
   $data = curl_exec($ch);
   curl_close($ch);

   return $data;
}

He fails to crawl the article redirecting to a login page and crawl the login page but not the article. why?

how to prevent redirecting? i also tried CURLOPT_FOLLOWLOCATION, false but dosent work. how to fix this?

Answer of my own question:

Added those 2 lines for creating and reading the cookies and it works.

 curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
 curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');

edited Jul 22 '14 at 10:31

asked Jul 15 '14 at 19:23

mwweb

7,625
4
19
24

1

http://stackoverflow.com/questions/20986395/why-does-curl-not-work-but-wget-works I think this can help! – Wikunia Jul 15 '14 at 19:28
curl_setopt($ch, CURLOPT_VERBOSE, true); could help you get some useful debugging information – Ruben de Vries Jul 15 '14 at 19:43

cant crawl url with curl

0 Answers0