0

I'm trying to scrape some recipes off a page to use as samples for a school project, but the page just keeps loading a blank page.

I'm following this tutorial - here

This is my code:

<?php

function curl($url) {
    $ch = curl_init();  // Initialising cURL
    curl_setopt($ch, CURLOPT_URL, $url);    // Setting cURL's URL option with the $url variable passed into the function
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
    $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
    curl_close($ch);    // Closing cURL
    return $data;   // Returning the data from the function
}
function scrape_between($data, $start, $end){
    $data = stristr($data, $start); // Stripping all data from before $start
    $data = substr($data, strlen($start));  // Stripping $start
    $stop = stripos($data, $end);   // Getting the position of the $end of the data to scrape
    $data = substr($data, 0, $stop);    // Stripping all data from after and including the $end of the data to scrape
    return $data;   // Returning the scraped data from the function
}

$continue = true;

$url = curl("https://www.justapinch.com/recipes/main-course/");

while ($continue == true) {
    $results_page = curl($url);
    $results_page = scrape_between($results_page,"<div id=\"grid-normal\">","<div id=\"rightside-content\"");
    $separate_results = explode("<h3 class=\"tight-margin\"",$results_page);

    foreach ($separate_results as $separate_result) {
        if ($separate_result != "") {
            $results_urls[] = "https://www.justapinch.com" . scrape_between($separate_result,"href=\"","\" class=\"");
        }
    }

    // Commented out to test code above

    // if (strpos($results_page,"Next Page")) {
    //     $continue = true;
    //     $url = scrape_between($results_page,"<nav><div class=\"col-xs-7\">","</div><nav>");
    //     if (strpos($url,"Back</a>")) {
    //         $url = scrape_between($url,"Back</a>",">Next Page");
    //     }
    //     $url = "https://www.justapinch.com" . scrape_between($url, "href=\"", "\"");
    // } else {
    //     $continue = false;
    // }
    // sleep(rand(3,5));

    print_r($results_urls);
}
?>

I'm using cloud9 and I've installed php5 cURL, and am running apache2. I would appreciate any help.

monsty
  • 77
  • 3
  • 13

2 Answers2

0

This is where the problem lies:

$results_page = curl($url);

You tried to fetch content not from a URL, but from a HTML page. Because, right before while(), you set $url to the result of a page. I think you should do the following:

$results_page = curl("https://www.justapinch.com/recipes/main-course/");

edit:

You should change how you query the html to using DOM.

ariefbayu
  • 21,849
  • 12
  • 71
  • 92
0

why do people do this? code completely void of error checking, then they go to some forum and ask why is this code, which completely ignores any and all errors, not working? I DONT FKING KNOW, BUT AT LEAST YOU COULD PUT UP SOME ERROR CHECKING AND RUN IT BEFORE ASKING. it's not just you, lots of people are doing it, and its annoying af, and you should all feel bad for doing it. curl_setopt returns bool(false) if there's an error setting the option. curl_exec returns bool(false) if there was an error in the transfer. curl_init returns bool(false) if there was an error creating the curl handle. extract the error description with curl_error, and report it with \RuntimeException. now delete this thread, add some error checking, and if the error checking does not reveal the problem, or it does but you're not sure how to fix it, THEN make a new thread about it.

here's some error-checking function wrappers to get you started:

function ecurl_setopt ( /*resource*/$ch , int $option , /*mixed*/ $value ):bool{
    $ret=curl_setopt($ch,$option,$value);
    if($ret!==true){
        //option should be obvious by stack trace
        throw new RuntimeException ( 'curl_setopt() failed. curl_errno: ' . return_var_dump ( curl_errno ($ch) ).'. curl_error: '.curl_error($ch) );
    }
    return true;
}
function ecurl_exec ( /*resource*/$ch):bool{
    $ret=curl_exec($ch);
    if($ret!==true){
        throw new RuntimeException ( 'curl_exec() failed. curl_errno: ' . return_var_dump ( curl_errno ($ch) ).'. curl_error: '.curl_error($ch) );
    }
    return true;
}


function return_var_dump(/*...*/){
    $args = func_get_args ();
    ob_start ();
    call_user_func_array ( 'var_dump', $args );
    return ob_get_clean ();
}
hanshenrik
  • 19,904
  • 4
  • 43
  • 89
  • Dude, I appreciate your help but there's no need to be rude. If it annoys you just ignore it. People who are relatively new to coding may be trying to get their head around the actual code and may not always be aware that error checking could help a lot. We all got to start somewhere don't we? You're an experienced programmer and your skills surpass a lot of peoples, but that doesn't mean its ok to be rude. – monsty Nov 18 '17 at 23:24