-3

I would like to scrape the google search result up to page 2 but i'm having trouble on the result of blank page of my website or timeout.

for($j=0; $j<$acount; $j++){
sleep(60);
for($sp = 0; $sp <= 10; $sp+=10){
                        $url = 'http://www.google.'.$lang.'/search?q='.$in.'&start='.$sp;
                        if($sp == 10){
                            $datenbank = "proxy_work.php"; 
                            $datei = fopen($datenbank,"a+");
                            fwrite($datei, $data);  
                            fwrite ($datei,"\r\n");
                            fclose($datei);
                        } else {

                            $datenbank = "proxy_work.php"; 
                            $datei = fopen($datenbank,"w+");
                            fwrite($datei, $data);  
                            fwrite ($datei,"\r\n");
                            fclose($datei);
                        }
}
                        $html = file_get_html("proxy_work.php");
                        foreach($html->find('a') as $e){
                            //  $title = $h3->innertext;
                            $link  = $e->href;
                        if(in_array($endomain, $approveurl)){ 
                                }
                            // if it is not a direct link but url reference found inside it, then extract
                            if (!preg_match('/^https?/', $link) && preg_match('/q=(.+)&amp;sa=/U', $link, $matches) && preg_match('/^https?/', $matches[1])) {
                                $link = $matches[1];
                         } else if (!preg_match('/^https?/', $link)) { // skip if it is not a valid link
                                continue;
                            } 
                        }

}

  • Why starting a new thread about this? Stay at this one: https://stackoverflow.com/questions/59747547/how-can-i-scrape-the-google-search-result-2nd-page-only – CodyKL Jan 16 '20 at 06:17
  • And like in your first question about this issue... There is no line code, where you try to get the result of the requested page, you only have defined the URL. – CodyKL Jan 16 '20 at 06:19
  • Does this answer your question? [How can i scrape the google search result 2nd page only?](https://stackoverflow.com/questions/59747547/how-can-i-scrape-the-google-search-result-2nd-page-only) – gehbiszumeis Jan 16 '20 at 06:30
  • It is different from this topic. This topic is about scraping up to 2 pages but this topic discuss scraping only the page 2. [link]https://stackoverflow.com/questions/59747547/how-can-i-scrape-the-google-search-result-2nd-page-only – Marc Justin Rait Jan 16 '20 at 06:36
  • CodyKL i get the result on the requested page by saving it on the file name proxy_work.php – Marc Justin Rait Jan 16 '20 at 06:43

1 Answers1

0

Google search result pages (SERP) are not like a common website with static html. Google preserves its data from web scraping. Consider its data as a business directory and see the following tips for business directory scrape:

  1. IP-proxying.
  2. Imitating human behaviour by using some browser automation tools (Selenium, iMacros and others).

Read more here.

Igor Savinkin
  • 5,669
  • 8
  • 37
  • 69