2

I have my website available to public and there is Scrapyd running at port 6800 like http://website.com:6800/

I do not want anyone to see list of my crawlers. I know anyone can easily guess type up port 6800 and can see whats going on.

I have few questions, answer any of them will help me.

  1. Is there way to password protect Scrapyd UI?
  2. Can I password protect a specific Port on Linux? I know it can be done with IPTables to ONLY ALLOW PARTICULAR IPs but thats not a good solution
  3. Should I make changes to Scrapyd's source-code?
  4. Can I password protect a specific port only via .htaccess?
Umair Ayub
  • 19,358
  • 14
  • 72
  • 146

3 Answers3

1

You should bind address of the machine that is going to make calls.

If its the localhost which is going to make calls to the endpoints just bind it to 127.0.0.1 and voila, the address doesn't work for external ips.

Rafael Almeida
  • 5,142
  • 2
  • 20
  • 33
1

Use the latest version of scrapyd (1.2.1 when am wrting this). Scrapyd support Basic HTTP Auth. Inorder to enable it just add username and password to scrapyd.conf as below

pip install git+https://github.com/scrapy/scrapyd.git
[scrapyd]
eggs_dir          = /var/lib/scrapyd/eggs

...
username    = username_here
password    = password_here
...
Basil Jose
  • 1,004
  • 11
  • 13
0

As of scrapyd version 1.2.0 the default bind address is 127.0.0.1

To add a password protection use this gist which uses nginx as a reverse proxy to add basic authentication to scrapyd.

You may also check scrapyd-authenticated repository.

Levon
  • 10,408
  • 4
  • 47
  • 42