I'm using ScraperWiki to build a simple screen scraper getting links from an online store. The store has multiple pages, so I want to get all the links from the first page, find the "next" button in the pager, go to that url, find all the links from there, go to next page, and so on and so forth.
Here's where I'm at. The ScraperWiki uses Simple HTML DOM and CSS selectors:
<?php
require 'scraperwiki/simple_html_dom.php';
function nextPage(){
$next = $html->find("li.pager-next a");
$nextUrl = 'http://www.domain.com';
$nextUrl .= $next->href . "\n";
getLinks($nextUrl);
}
function getLinks($url){ // gets links from product list page
$html_content = scraperwiki::scrape($url);
$html = str_get_html($html_content);
$x = 0;
foreach ($html->find("div.views-row a.imagecache-product_list") as $el) {
$url = $el->href . "\n";
$allLinks[$x] = 'http://www.domain.com';
$allLinks[$x] .= $url;
$x++;
}
nextPage();
}
getLinks("http://www.domain.com/foo/bar");
print_r($allLinks);
?>
The getLinks()
function works fine when NOT in a function, but I'm getting "undeclared variable" errors when I put them in a function. My question is:
In PHP can I declare empty variables/arrays to use throughout the script, like in Javascript? I've read a few answers here on Stack which seems to imply that there is no need to declare, which seems odd.