10

I'm new to Python and Scrapy and I'm walking through the Scrapy tutorial. I've been able to create my project by using DOS interface and typing:

scrapy startproject dmoz

The tutorial later refers to the Crawl command:

scrapy crawl dmoz.org

But each time I try to run that I get a message that this is not a legit command. In looking around further it looks like I need to be inside a project and that's what I can't figure out. I've tried changing directories into the "dmoz" folder I created in startproject but that does not recognize Scrapy at all.

I'm sure I'm missing something obvious and I'm hoping someone can point it out.

Uyghur Lives Matter
  • 18,820
  • 42
  • 108
  • 144
Adam Smith
  • 103
  • 1
  • 1
  • 5

2 Answers2

9

You have to execute it in your 'startproject' folder. You will have another commands if it finds your scrapy.cfg file. You can see the diference here:

$ scrapy startproject bar
$ cd bar/
$ ls
bar  scrapy.cfg
$ scrapy
Scrapy 0.12.0.2536 - project: bar

Usage:
  scrapy <command> [options] [args]

Available commands:
  crawl         Start crawling from a spider or URL
  deploy        Deploy project in Scrapyd target
  fetch         Fetch a URL using the Scrapy downloader
  genspider     Generate new spider using pre-defined templates
  list          List available spiders
  parse         Parse URL (using its spider) and print the results
  queue         Deprecated command. See Scrapyd documentation.
  runserver     Deprecated command. Use 'server' command instead
  runspider     Run a self-contained spider (without creating a project)
  server        Start Scrapyd server for this project
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy

Use "scrapy <command> -h" to see more info about a command


$ cd ..
$ scrapy
Scrapy 0.12.0.2536 - no active project

Usage:
  scrapy <command> [options] [args]

Available commands:
  fetch         Fetch a URL using the Scrapy downloader
  runspider     Run a self-contained spider (without creating a project)
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy

Use "scrapy <command> -h" to see more info about a command
anders
  • 825
  • 2
  • 10
  • 18
  • Thanks Anders. That's what I assumed I should be doing, I'm glad I not crazy. I'm in DOS, and when I change directories and run "dir" I see the folder of my project and the scrapy.cfg file. However when I run the scrapy command I get a response that "scrapy is not recognized as an internal or external command, operable program or batch file" – Adam Smith Feb 17 '11 at 02:01
  • Also, here are the contents of the scrapy.cfg file in case it's generating incorrectly # Automatically created by: scrapy startproject # # For more information about the [deploy] section see: # http://doc.scrapy.org/topics/scrapyd.html [settings] default = Test1.settings [deploy] #url = http://localhost:6800/ project = Test1 – Adam Smith Feb 17 '11 at 02:02
  • 1
    so if "scrapy is not recognized" it should be the PATH enviromental variables; how about if you try executing it with the full path?? (don't know in Windows how's that stuff, sorry :S). Also the .cfg seems to be fine. – anders Feb 17 '11 at 12:00
  • That did it! Thanks Anders, can't wait to dig into it. – Adam Smith Feb 18 '11 at 01:29
  • If you want to avoid the craziness of Scrapy taking over your project with it's generated files and other garbage. Use scrapy runspider foo.py Unfortunately this means you can't do other things like feed exporters, but that's the braindead developers fault. – user1244215 Aug 17 '12 at 02:37
2

The PATH environmental variables aren't set.

You can set the PATH environmental variables for both Python and Scrapy by finding System Properties (My Computer > Properties > Advanced System Settings) navigating to the Advanced tab and clicking the Environment Variables button. In the new window, scroll to Variable Path in the System Variables window and add the following lines separated by semi-colons

C:\{path to python folder}
C:\{path to python folder}\Scripts

example

C:\Python27;C:\Python27\Scripts

Akersh
  • 43
  • 4