I'm trying to scrape some recipes off a page to use as samples for a school project, but the page just keeps loading a blank page.
I'm following this tutorial - here
This is my code:
<?php
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
function scrape_between($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start)); // Stripping $start
$stop = stripos($data, $end); // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape
return $data; // Returning the scraped data from the function
}
$continue = true;
$url = curl("https://www.justapinch.com/recipes/main-course/");
while ($continue == true) {
$results_page = curl($url);
$results_page = scrape_between($results_page,"<div id=\"grid-normal\">","<div id=\"rightside-content\"");
$separate_results = explode("<h3 class=\"tight-margin\"",$results_page);
foreach ($separate_results as $separate_result) {
if ($separate_result != "") {
$results_urls[] = "https://www.justapinch.com" . scrape_between($separate_result,"href=\"","\" class=\"");
}
}
// Commented out to test code above
// if (strpos($results_page,"Next Page")) {
// $continue = true;
// $url = scrape_between($results_page,"<nav><div class=\"col-xs-7\">","</div><nav>");
// if (strpos($url,"Back</a>")) {
// $url = scrape_between($url,"Back</a>",">Next Page");
// }
// $url = "https://www.justapinch.com" . scrape_between($url, "href=\"", "\"");
// } else {
// $continue = false;
// }
// sleep(rand(3,5));
print_r($results_urls);
}
?>
I'm using cloud9
and I've installed php5 cURL
, and am running apache2
. I would appreciate any help.