0

I modified the script below to get all links on the $url set in the code.

I seems to work to some extent, it is getting all pages URL, however not parsing all pages. It is parsing only the first pages and repeat the result for the rest.

Can someone tell me what I am doing wrong here, I already spent more than a day trying everything. I've also include the result that I am getting.

<?php
include('simple_html_dom.php');
$base = "http://singersroom.com";
$url = "http://singersroom.com/subcontent/rnb-news/";

// Start from the main page
$nextLink = $url;

// Loop on each next Link as long as it exsists
while ($nextLink) {
    echo "<hr>nextLink: $nextLink<br>";
    //Create a DOM object
    $html = new simple_html_dom();
    // Load HTML from a url
    $html->load_file($nextLink);
    $posts = $html->find('h3[class=prl-article-title]');
    foreach($posts as $post) {
        // Get the link
        $articles = $post->children(0)->href;        
        echo $base,$articles.'</br>';
    }
    // Extract the next link, if not found return NULL
    //$nextLink = ( ($temp = $html->find('div[class=pagination]', 0)->last_child()) ? $temp->href : NULL );

    //$nextLink = ( ($temp = $html->find('div.pagination a[class="Next >>"]', 0)) ? "http://singersroom.com/subcontent/rnb-news/".$temp->href : NULL );
    $nextLink = ( ($temp = $html->find('div[class=pagination]', 0)->last_child()) ? "http://singersroom.com/subcontent/rnb-news/".$temp->href : NULL );

    //echo $temp;
    // Clear DOM object
    $html->clear();
    unset($html);
}

?>

Below is the result that I am getting:

nextLink: hxxp://singersroom.com/subcontent/rnb-news/ hxxp://singersroom.com/content/2014-04-18/Prince-Collabs-with-Warner-Bros-for-New-Music-Purple-Rain-Anniversary-Album/ hxxp://singersroom.com/content/2014-04-17/Tamar-Braxton-Adds-Tour-Dates-Thanks-Fans-For-Support/ hxxp://singersroom.com/content/2014-04-14/Tamar-Braxton-Readies-New-Album-Inks-Third-Season-of-Tamar-Vince/ hxxp://singersroom.com/content/2014-04-14/Jennifer-Hudson-Walk-It-Out-Ft-Timbaland/ hxxp://singersroom.com/content/2014-04-15/Kindred-The-Family-Soul-Everybodys-Hustlin/ hxxp://singersroom.com/content/2014-04-15/Lyrica-Anderson-Freakin-ft-Wiz-Khalifa/ hxxp://singersroom.com/content/2014-04-07/Dont-Worry-About-Them-10-Baby-Mothers-That-Are-Doing-Just-Fine/ hxxp://singersroom.com/content/2014-03-27/Top-Ten-Best-Soundtracks-From-The-90s/ hxxp://singersroom.com/content/2014-04-16/The-Forbes-Five-2014s-Wealthiest-Artists-in-Hip-Hop/ nextLink: hxxp://singersroom.com/subcontent/rnb-news/?page=2 hxxp://singersroom.com/content/2014-04-18/Prince-Collabs-with-Warner-Bros-for-New-Music-Purple-Rain-Anniversary-Album/ hxxp://singersroom.com/content/2014-04-17/Tamar-Braxton-Adds-Tour-Dates-Thanks-Fans-For-Support/ hxxp://singersroom.com/content/2014-04-14/Tamar-Braxton-Readies-New-Album-Inks-Third-Season-of-Tamar-Vince/ hxxp://singersroom.com/content/2014-04-14/Jennifer-Hudson-Walk-It-Out-Ft-Timbaland/ hxxp://singersroom.com/content/2014-04-15/Kindred-The-Family-Soul-Everybodys-Hustlin/ hxxp://singersroom.com/content/2014-04-15/Lyrica-Anderson-Freakin-ft-Wiz-Khalifa/ hxxp://singersroom.com/content/2014-04-07/Dont-Worry-About-Them-10-Baby-Mothers-That-Are-Doing-Just-Fine/ hxxp://singersroom.com/content/2014-03-27/Top-Ten-Best-Soundtracks-From-The-90s/ hxxp://singersroom.com/content/2014-04-16/The-Forbes-Five-2014s-Wealthiest-Artists-in-Hip-Hop/ . . . nextLink: hxxp://singersroom.com/subcontent/rnb-news/?page=96 hxxp://singersroom.com/content/2014-04-18/Prince-Collabs-with-Warner-Bros-for-New-Music-Purple-Rain-Anniversary-Album/ hxxp://singersroom.com/content/2014-04-17/Tamar-Braxton-Adds-Tour-Dates-Thanks-Fans-For-Support/ hxxp://singersroom.com/content/2014-04-14/Tamar-Braxton-Readies-New-Album-Inks-Third-Season-of-Tamar-Vince/ hxxp://singersroom.com/content/2014-04-14/Jennifer-Hudson-Walk-It-Out-Ft-Timbaland/ hxxp://singersroom.com/content/2014-04-15/Kindred-The-Family-Soul-Everybodys-Hustlin/ hxxp://singersroom.com/content/2014-04-15/Lyrica-Anderson-Freakin-ft-Wiz-Khalifa/ hxxp://singersroom.com/content/2014-04-07/Dont-Worry-About-Them-10-Baby-Mothers-That-Are-Doing-Just-Fine/ hxxp://singersroom.com/content/2014-03-27/Top-Ten-Best-Soundtracks-From-The-90s/ hxxp://singersroom.com/content/2014-04-16/The-Forbes-Five-2014s-Wealthiest-Artists-in-Hip-Hop/

Spykey
  • 1

1 Answers1

0

Your links are all in hxxp, which means that they are not valid links. Replace hxxp by http in your urls, and you should be able to go to the next step.

Charles Sarrazin
  • 801
  • 7
  • 13
  • No,I changed it to hxxp because stackoverflow will not allow me to post more than 2 links.In my code it is http:// – Spykey Apr 22 '14 at 10:26