How do you utilize proxy support with the python web-scraping framework Scrapy?
9 Answers
Single Proxy
Enable
HttpProxyMiddleware
in yoursettings.py
, like this:DOWNLOADER_MIDDLEWARES = { 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 1 }
pass proxy to request via
request.meta
:request = Request(url="http://example.com") request.meta['proxy'] = "host:port" yield request
You also can choose a proxy address randomly if you have an address pool. Like this:
Multiple Proxies
class MySpider(BaseSpider):
name = "my_spider"
def __init__(self, *args, **kwargs):
super(MySpider, self).__init__(*args, **kwargs)
self.proxy_pool = ['proxy_address1', 'proxy_address2', ..., 'proxy_addressN']
def parse(self, response):
...parse code...
if something:
yield self.get_request(url)
def get_request(self, url):
req = Request(url=url)
if self.proxy_pool:
req.meta['proxy'] = random.choice(self.proxy_pool)
return req
-
12The documentation says that the `HttpProxyMiddleware` is setting the proxy inside every Requests meta attr, so enabling ProxyMiddleware AND setting it manually would make no sense – Rafael T Dec 22 '14 at 20:16
-
1I should have copied this code. I glanced it and then coded myself, but proxy functionality was not working. Now I see the proxy value was set to `request.headers` instead of `request.meta`. Stupid me (face palm)! I went to see the `HttpProxyMiddleware` code, it skips if someone has already set `request.meta['proxy']`, so there is no need to list it in the settings https://github.com/scrapy/scrapy/blob/master/scrapy/downloadermiddlewares/httpproxy.py – Thamme Gowda Jul 21 '17 at 03:48
-
1I am not sure I understand the difference between the two, is `BaseSpider` your original spider and `MySpider` or is `MySpider` is the actual modified spider and `BaseSpider `refers to `scrapy.Spider`? – ishandutta2007 Dec 19 '19 at 10:46
From the Scrapy FAQ,
Does Scrapy work with HTTP proxies?
Yes. Support for HTTP proxies is provided (since Scrapy 0.8) through the HTTP Proxy downloader middleware. See
HttpProxyMiddleware
.
The easiest way to use a proxy is to set the environment variable http_proxy
. How this is done depends on your shell.
C:\>set http_proxy=http://proxy:port csh% setenv http_proxy http://proxy:port sh$ export http_proxy=http://proxy:port
if you want to use https proxy and visited https web,to set the environment variable http_proxy
you should follow below,
C:\>set https_proxy=https://proxy:port csh% setenv https_proxy https://proxy:port sh$ export https_proxy=https://proxy:port
-
Thanks ... So I need to set this var before running scrapy crawler it's not possible to set it or change it from the crawler code – no1 Jan 17 '11 at 11:59
-
20You can even set the proxy on a per-request base with: request.meta['proxy'] = 'http://your.proxy.address' – Pablo Hoffman Jan 25 '11 at 19:35
-
3
-
1
-
@ocean800 I use scrapy to scrape a website that shows your current IP to see if it's using the proxy. That way I can load the page via a chrome and see my actual IP and compare it to what scrapy sees on the same page. – Shannon Cole Jun 24 '18 at 12:53
1-Create a new file called “middlewares.py” and save it in your scrapy project and add the following code to it.
import base64
class ProxyMiddleware(object):
# overwrite process request
def process_request(self, request, spider):
# Set the location of the proxy
request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT"
# Use the following lines if your proxy requires authentication
proxy_user_pass = "USERNAME:PASSWORD"
# setup basic authentication for the proxy
encoded_user_pass = base64.encodestring(proxy_user_pass)
request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass
2 – Open your project’s configuration file (./project_name/settings.py) and add the following code
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
'project_name.middlewares.ProxyMiddleware': 100,
}
Now, your requests should be passed by this proxy. Simple, isn’t it ?

- 8,955
- 3
- 53
- 79

- 2,599
- 22
- 41
-
I implement your solution which looks correct, but I keep getting a Twisted error: twisted.web._newclient.ResponseNeverReceived: [
>] ANY ADVICE??? – ccdpowell May 07 '15 at 01:09 -
3Take care to use `base64.b64encode` instead of `base64.encodestring` as the latter adds a newline character to the encoded base64 result...! See http://stackoverflow.com/a/32243566/426790 – Greg Sadetsky Feb 28 '16 at 03:03
-
-
2
-
As I've had trouble by setting the environment in /etc/environment, here is what I've put in my spider (Python):
os.environ["http_proxy"] = "http://localhost:12345"
-
Might as well add ```os.environ["https_proxy"]``` to it. Worked for me having both. – James Koss May 10 '19 at 05:08
There is nice middleware written by someone [1]: https://github.com/aivarsk/scrapy-proxies "Scrapy proxy middleware"

- 819
- 1
- 15
- 17
Here is what I do
Method 1:
Create a Download Middleware like this
class ProxiesDownloaderMiddleware(object):
def process_request(self, request, spider):
request.meta['proxy'] = 'user:pass@host:port'
and enable that in settings.py
DOWNLOADER_MIDDLEWARES: {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
'my_scrapy_project_directory.middlewares.ProxiesDownloaderMiddleware': 600,
},
That is it, now proxy will be applied to every request
Method 2:
Just enable HttpProxyMiddleware
in settings.py
and then do this for each request
yield Request(url=..., meta={'proxy': 'user:pass@host:port'})

- 19,358
- 14
- 72
- 146
-
-
-
-
1@rom its name of folder where your scrapy project is in, its that simple – Umair Ayub Dec 14 '21 at 05:09
In Windows I put together a couple of previous answers and it worked. I simply did:
C:> set http_proxy = http://username:password@proxy:port
and then I launched my program:
C:/.../RightFolder> scrapy crawl dmoz
where "dmzo" is the program name (I'm writing it because it's the one you find in a tutorial on internet, and if you're here you have probably started from the tutorial).

- 829
- 12
- 24
I would recommend you to use a middleware such as scrapy-proxies. You can rotate proxies, filter bad proxies or use a single proxy for all your request. Also,using a middleware will save you the trouble of setting up proxy on every run.
This is directly from the GitHub README.
Install the scrapy-rotating-proxy library
pip install scrapy_proxies
In your settings.py add the following settings
# Retry many times since proxies often fail
RETRY_TIMES = 10
# Retry on most error codes since proxies fail for different reasons
RETRY_HTTP_CODES = [500, 503, 504, 400, 403, 404, 408]
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,
'scrapy_proxies.RandomProxy': 100,
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}
# Proxy list containing entries like
# http://host1:port
# http://username:password@host2:port
# http://host3:port
# ...
PROXY_LIST = '/path/to/proxy/list.txt'
# Proxy mode
# 0 = Every requests have different proxy
# 1 = Take only one proxy from the list and assign it to every requests
# 2 = Put a custom proxy to use in the settings
PROXY_MODE = 0
# If proxy mode is 2 uncomment this sentence :
#CUSTOM_PROXY = "http://host1:port"
Here you can change retry times, set a single or rotating proxy
- Then add your proxy to a list.txt file like this
http://host1:port
http://username:password@host2:port
http://host3:port
After this all your requests for that project will be sent through proxy. Proxy is rotated for every request randomly. It will not affect concurrency.
Note: if you donot want to use proxy. You can simply comment the scrapy_proxy middleware line.
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,
# 'scrapy_proxies.RandomProxy': 100,
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}
Happy crawling!!!

- 911
- 8
- 23