I'm trying to write a spider with Scrapy that will ideally return a list of URLs from a site if said URL(s) contain a certain class which I would define in print response.css(".class")
, but I'm not sure if this is even possible when the class will only be on the page if a user is logged in.
I've gone through guides on how to write spiders with Scrapy and I've gotten it to return a list of selectors using a different class that I know is on the page whether or not a user is logged in, just as a test to know that I didnt write it wrong. I really just want to know if this is even possible and if so what steps I can take to get there.
import scrapy
class TestSpider(scrapy.Spider):
name = 'testspider'
allowed_domains = ['www.example.com']
start_urls = ['http://example.com/']
def parse(self, response):
print response.css(".class")
The code I have so far is obviously very basic and barely edited from the generated template as I'm still in the simple testing phase of this. Ideally I want to get a list of selectors which would, if this is possible, then give me a list of URLs for each page where the class is found. All I'm looking for is the URLs of the pages that contain the defined class.