0

I am trying to scrape all sites from THIS website.

I will use www.site.com instead of real domain just to simpify my problem.

Basically, there is a list of around 300 000 sites, each page has 30 results, so there should be around 10000 pages.

This is an example:

www.site.com/1 -> sites from 1-30

www.site.com/2 -> sites from 30-60

www.site.com/3 -> sites from 60-90

www.site.com/4 -> sites from 90-120

The problem is, when I reach page 167, there are no more results after that shown. That way, I can see only list of the first 5000 sites.

When I write this:

www.site.com/168

I get this error: PHP Warning – yii\base\ErrorException

Click HERE to see full error.

I was able to create a script in python that will scrape first 5000 sites, but I don't have any idea on how to access full list.

For example, there is a possibility to search for certain keywords on that page, but again, if there are more than 5000 results, only first 5000 sites will be shown.

Any ideas on how to solve this problem?

Lenny
  • 1
  • 1
  • https://www.shopistores.com/shopify/167 doesn't show any pages past 167. How do you know the target website actually has 307,547 records? It might be a made up number. – Chillie Nov 06 '20 at 13:01
  • It's not a made up number. I made two searches for two different keywords, for example "man" and "woman", and each of them had 15k+ results. I scraped first 5k sites for each keyword, compared them, and all of the sites were unique. There were no duplicates. I am sure that there are actually 307547 records, but I don't know how to access it. – Lenny Nov 06 '20 at 13:03
  • Unless you find an internal api which this site doesn't seem to use, you can [contact the website's owner](https://www.shopistores.com/contact/) about getting more results. If the owner set a result limit of 5000 results for a categoty, you're not getting over it. – Chillie Nov 06 '20 at 13:13
  • Next Time try to add some more code Inside error has your answer. in /mnt/htdocs/myyii2/controllers/SiteController.php Exception occures because $res don't has key name "matches". Possible to add a if statement before foreach can solve your problem. – Nirav Bhoi Nov 06 '20 at 13:27
  • Take a look here : https://stackoverflow.com/questions/2630013/invalid-argument-supplied-for-foreach – Nirav Bhoi Nov 06 '20 at 13:35
  • Thank you but that is not my site and I can't edit code to add statement before foreach. That's the problem. I only want to scrape data from that page. – Lenny Nov 06 '20 at 13:45

0 Answers0