I am using Python Beautiful Soup for website Scrapping. My program hits different urls of a website more than thousand times. I don not wish to get banned. As a first step, I would like to introduce IPmasking in my project. Is there any possible way to hit different urls of a website from a pool of rotating IPs with the help of Python modules like ipaddress, socket etc?
Asked
Active
Viewed 2,860 times
0
-
2This sounds suspiciously malicious / illegal / bad. – Brendan Dec 10 '14 at 07:10
-
No Brendan, am only having good intention like "hunger for knowledge" and am just Scrapping the price and details about products to make a comparison. – V Manikandan Dec 10 '14 at 07:27
-
1I'll give you benefit of the doubt, but if you truly have good intentions: if a website is banning you, **do not** be doing it like that. You are subject to their conditions and shouldn't try to *bypass* them. If you are legitimately sending thousands of requests to a website, try contacting them instead. – Brendan Dec 10 '14 at 07:38
-
Yeah you are right, the actual purpose behind the question is to know is there any way, so I just described the question like this. if I have any bad intention i would not have asked this question here. – V Manikandan Dec 10 '14 at 08:18
1 Answers
1
The problem is your public IP address. What you can do is use a list of proxy's and rotate through them.

rmarques
- 91
- 6
-
I can get a list of proxy's, but how can I make use of it to hit url each time with different IPs? – V Manikandan Dec 10 '14 at 10:09
-
Whatever tool you are using to make the http requests (urllib, requests,...) should have support for proxys. Check the documentation of the tool you are using. Also if you are going to do crawling/scraping in a large scale consider using a framework like scrapy. – rmarques Dec 10 '14 at 10:27
-
I already started Python Scrapy Framework. I have only http://doc.scrapy.org/en/latest/topics/practices.html#bans in scrapy. I searched, but didi not get a better document. – V Manikandan Dec 10 '14 at 10:44
-
This is a common problem and there are a lot of resources out there about this. You can start with the wiki of the scrapy project in github https://github.com/scrapy/scrapy/wiki (Using Scrapy with different/many proxies). Ultimately you can use a comercial service like scrapinghub and crawlera http://scrapinghub.com/crawlera – rmarques Dec 10 '14 at 10:54