Let me try to explain based on the Scrapy Sample Code
shown on the Scrapy Website. I saved this in a file scrapy_example.py
.
from scrapy import Spider, Item, Field
class Post(Item):
title = Field()
class BlogSpider(Spider):
name, start_urls = 'blogspider', ['http://blog.scrapinghub.com']
def parse(self, response):
return [Post(title=e.extract()) for e in response.css("h2 a::text")]
Executing this with the command scrapy runspider scrapy_example.py
it will produce the following output:
(...)
DEBUG: Crawled (200) <GET http://blog.scrapinghub.com> (referer: None) ['partial']
DEBUG: Scraped from <200 http://blog.scrapinghub.com>
{'title': u'Using git to manage vacations in a large distributed\xa0team'}
DEBUG: Scraped from <200 http://blog.scrapinghub.com>
{'title': u'Gender Inequality Across Programming\xa0Languages'}
(...)
Crawled
means: scrapy has downloaded that webpage.
Scraped
means: scrapy has extracted some data from that webpage.
The URL
is given in the script as start_urls
parameter.
Your output must have been generated by running a spider. Search the file where that spider is defined and you should be able to spot the place where the url is defined.