I am using scrapy to scrap craigslist and get all links, go to that link, store the description for each page and email for reply. Now I have written a scrapy script which gors through the craigslist/sof.com and gets all job titles and urls. I want to go inside each url and save the email and description per job. Heres my code :
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from craigslist.items import CraigslistItem
class MySpider(BaseSpider):
name = "craig"
allowed_domains = ["craigslist.org"]
start_urls = ["http://sfbay.craigslist.org/npo/"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select("//span[@class='pl']")
for titles in titles:
title = titles.select("a/text()").extract()
link = titles.select("a/@href").extract()
desc = titles.select("a/replylink").extract
print link, title
Any ideas how to do this ?