0

I seem to have run up against an issue with a Scrapy spider deployment that has caused some listening errors, though I haven't been able to use any of the previous answers successfully, either because it's a different issue or the fixes weren't detailed enough for me to follow.

I've got a project uploaded and the deploy command worked yesterday. Now I'm toying with it again and when I run scrapy deploy -l to see the list of projects, I get this:

Scrapy 0.24.4 - no active project

Unknown command: deploy

Use "scrapy" to see available commands

So one common fix seems to say that I need to restart Scrapyd with the command: scrapyd. When I do that, I get:

2014-09-17 01:58:47+0000 [-] Log opened.
2014-09-17 01:58:47+0000 [-] twistd 13.2.0 (/usr/bin/python 2.7.6) starting up.
2014-09-17 01:58:47+0000 [-] reactor class: twisted.internet.epollreactor.EPollReactor.
2014-09-17 01:58:47+0000 [-] Traceback (most recent call last):
2014-09-17 01:58:47+0000 [-]   File "/usr/bin/scrapyd", line 8, in <module>
2014-09-17 01:58:47+0000 [-]     run()
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/scripts/twistd.py", line 27, in run
2014-09-17 01:58:47+0000 [-]     app.run(runApp, ServerOptions)
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/application/app.py", line 642, in run
2014-09-17 01:58:47+0000 [-]     runApp(config)
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/scripts/twistd.py", line 23, in runApp
2014-09-17 01:58:47+0000 [-]     _SomeApplicationRunner(config).run()
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/application/app.py", line 380, in run
2014-09-17 01:58:47+0000 [-]     self.postApplication()
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/scripts/_twistd_unix.py", line 193, in postApplication
2014-09-17 01:58:47+0000 [-]     self.startApplication(self.application)
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/scripts/_twistd_unix.py", line 381, in startApplication
2014-09-17 01:58:47+0000 [-]     service.IService(application).privilegedStartService()
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/application/service.py", line 277, in privilegedStartService
2014-09-17 01:58:47+0000 [-]     service.privilegedStartService()
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/application/internet.py", line 105, in privilegedStartService
2014-09-17 01:58:47+0000 [-]     self._port = self._getPort()
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/application/internet.py", line 133, in _getPort
2014-09-17 01:58:47+0000 [-]     'listen%s' % (self.method,))(*self.args, **self.kwargs)
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/internet/posixbase.py", line 495, in listenTCP
2014-09-17 01:58:47+0000 [-]     p.startListening()
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/internet/tcp.py", line 980, in startListening
2014-09-17 01:58:47+0000 [-]     raise CannotListenError(self.interface, self.port, le)
2014-09-17 01:58:47+0000 [-] twisted.internet.error.CannotListenError: Couldn't listen on 0.0.0.0:6800: [Errno 98] Address already in use.

Appears to be some sort of listening error based on that info and some other questions posted here, but I just can't figure out which solution should be working or where to tweak those.

EDIT:

Here's what I'm getting after I restart Scrapyd:

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:6800            0.0.0.0:*               LISTEN      956/python      
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1004/sshd       
tcp6       0      0 :::22                   :::*                    LISTEN      1004/sshd       
udp        0      0 0.0.0.0:14330           0.0.0.0:*                           509/dhclient    
udp        0      0 0.0.0.0:68              0.0.0.0:*                           509/dhclient    
udp6       0      0 :::3311                 :::*                                509/dhclient

EDIT 2:

EDIT 2

So I traced back and started in my local project directory again to try and figure out where this all went wrong. And here's what I've got now when I try to list them locally:

Christophers-MacBook-Pro:shn Chris$ scrapy deploy -l
aws-target           http://*********.compute-1.amazonaws.com:6800/
Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 5, in <module>
    pkg_resources.run_script('Scrapy==0.22.2', 'scrapy')
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources.py", line 489, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources.py", line 1207, in run_script
    execfile(script_filename, namespace, namespace)
  File "/Library/Python/2.7/site-packages/Scrapy-0.22.2-py2.7.egg/EGG-INFO/scripts/scrapy", line 4, in <module>
    execute()
  File "/Library/Python/2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/cmdline.py", line 143, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/Library/Python/2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/cmdline.py", line 89, in _run_print_help
    func(*a, **kw)
  File "/Library/Python/2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/cmdline.py", line 150, in _run_command
    cmd.run(args, opts)
  File "/Library/Python/2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/commands/deploy.py", line 76, in run
    print("%-20s %s" % (name, target['url']))
KeyError: 'url'

EDIT 3:

Here's the config file...

# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# http://doc.scrapy.org/en/latest/topics/scrapyd.html

[settings]
default = shn.settings

[deploy:local-target]
#url = http://localhost:6800/
project = shn

[deploy:aws-target]
url = http://********.compute-1.amazonaws.com:6800/
project = shn

For what it's worth, I can now run it again with the curl option, and it saves a log file and an output on the aws :6800. Though the scrapy deploy command still gives me that error I posted before, though.

Chris
  • 249
  • 5
  • 18

2 Answers2

1

Sounds like scrapyd is still running as twisted hasn't released the port. Can you confirm that using netstat:

$ sudo netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:17123         0.0.0.0:*               LISTEN      1048/python
tcp        0      0 0.0.0.0:6800            0.0.0.0:*               LISTEN      1434/python
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      995/sshd
tcp6       0      0 :::22                   :::*                    LISTEN      995/sshd
udp        0      0 127.0.0.1:8125          0.0.0.0:*                           1047/python
udp        0      0 0.0.0.0:68              0.0.0.0:*                           493/dhclient
udp        0      0 0.0.0.0:16150           0.0.0.0:*                           493/dhclient
udp6       0      0 :::28687                :::*                                493/dhclient

Kill scrapyd:

$ sudo kill -INT $(cat /var/run/scrapyd.pid)

Then restart:

$ sudo service scrapyd start

Then cd into the project directory, make sure you have defined a deploy target in the scrapy.cfg file:

$ cd ~/takeovertheworld
vagrant@portia:~/takeovertheworld$ cat scrapy.cfg

# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# http://doc.scrapy.org/en/latest/topics/scrapyd.html

[settings]
default = takeovertheworld.settings

[deploy:local-target]
url = http://localhost:6800/
project = takeovertheworld

[deploy:aws-target]
url = http://my-ec2-instance.amazonaws.com:6800/
project = takeovertheworld

and deploy the project:

vagrant@portia:~/takeovertheworld$ scrapy deploy aws-target
Packing version 1410145736
Deploying to project "takeovertheworld" in http://ec2-xx-xxx-xx-xxx.compute-1.amazonaws.com:6800/addversion.json
Server response (200):
{"status": "ok", "project": "takeovertheworld", "version": "1410145736", "spiders": 1}

Edit your scrapy.cfg file. Remove the # from the url line in local-target or remove local-target completely if you don't need it.

dataisbeautiful
  • 546
  • 2
  • 12
  • Hmm. Doesn't seem to work. When I do that, I get the following back after the second step: /usr/bin/python: can't open file '/usr/local/bin/twistd': [Errno 2] No such file or directory – Chris Sep 18 '14 at 01:14
  • Edited my answer above, try using service to start scrapyd. – dataisbeautiful Sep 18 '14 at 01:32
  • Tried that. Appears to have successfully restarted Scrapyd, that's for sure. But I'm still getting the deploy error. See my above edit to see what I saw when I went through your new first step. – Chris Sep 18 '14 at 01:38
  • You need to be in the project directory and have the deploy target specified in your scrapy.cfg you can then deploy using `scrapy deploy target` – dataisbeautiful Sep 18 '14 at 01:41
  • Ahh, yeah. This is even weirder: The Scrapy page at :6800 shows I have the two available projects I uploaded yesterday and tested fine. But now the EC2 doesn't seem to think either exists, I guess. When I list files in the /home/ubuntu directory, there's nothing. Can't figure out how to navigate back to those, but they must still exist if they're showing up on the Scrapy page, no? I'm tempted to just scrap the instance and start over but I'm afraid if I don't figure this out, it'll happen again and I'll be right back in the same spot. – Chris Sep 18 '14 at 02:30
  • The projects will still exist in `/var/lib/scrapyd/` as deployed eggs which is why you can see them in scrapyd. Try `locate scrapy.cfg` to see if you accidentally created the projects somewhere else otherwise I'd say they were deleted. Were you deploying from your local machine or from scrapy on the EC2 instance? – dataisbeautiful Sep 18 '14 at 02:37
  • I edited the question again at the end. Now it looks like I may have an issue locally. No idea what I'm getting stuck on. This worked like a charm the first time I tried it. Come back a day later and it's a mess. – Chris Sep 19 '14 at 01:53
  • Can you post your `scrapy.cfg`, looks like a config error there. – dataisbeautiful Sep 19 '14 at 02:18
  • Just posted it under Edit 3. Thanks! – Chris Sep 19 '14 at 02:21
  • uncomment the url parameter under local-target – dataisbeautiful Sep 19 '14 at 02:31
  • That was it. I wonder how I did that. On a side note, I just realized it's your blog I've been using to go through the tutorial, so you've been helping me on all fronts. Can't thank you enough. Been stuck on this process for months and looking forward to your next post on it. Thanks! – Chris Sep 19 '14 at 02:51
0

Try to stop and restart scrapyd service on your amazon ec2 server. make sure your config file have correct information of deployment

    [deploy:deploye_name]
    url = http://ip_Address:port_number/
    project = your_project_name

go to project directory where config.cfg exist, and check for available deployments

    scrapy deploy -l
Tasawer Nawaz
  • 927
  • 8
  • 19
  • This didn't do anything for me either. And the project worked a night earlier, so I know it was active. Though judging by the fact that the aws DNS address:6800 page is down, I'm guessing Scrapyd is down, so that is probably the problem. Just not sure how to start it up again, I guess. – Chris Sep 18 '14 at 01:25