I'm a newbie and I'm trying to scrape the href
link of each place listed in this website. Then I want to go into each link and scrape data but I'm not even able to get the href links from this code. However, I'm able to use the same xpath selector in the Scrapy shell to get the href
.
import scrapy
from scrapy_splash import SplashRequest
class TestspiSpider(scrapy.Spider):
name = 'testspi'
allowed_domains = ["powersearch.jll.com"]
start_urls = ["https://powersearch.jll.com/us-en/property/search"]
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url=url,callback= self.parse, args={'wait':5})
def parse(self,response):
properties=response.xpath('//*[@class="ssr__container"]').extract()
print (properties)
print ("HELLO WORLD")
When I run the code, I get an empty list. Here's the output:
2020-09-03 19:58:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-09-03 19:58:49 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-09-03 19:58:49 [py.warnings] WARNING: /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/scrapy_splash/request.py:41: ScrapyDeprecationWarning: Call to deprecated function to_native_str. Use to_unicode instead.
url = to_native_str(url)
2020-09-03 19:58:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://powersearch.jll.com/us-en/property/search via http://localhost:8050/render.html> (referer: None)
[]
HELLO WORLD
2020-09-03 19:58:59 [scrapy.core.engine] INFO: Closing spider (finished)
2020-09-03 19:58:59 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 535,
'downloader/request_count': 1,
'downloader/request_method_count/POST': 1,
'downloader/response_bytes': 148739,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 9.802616,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2020, 9, 3, 14, 28, 59, 274213),
'log_count/DEBUG': 1,
'log_count/INFO': 10,
'log_count/WARNING': 1,
'memusage/max': 51179520,
'memusage/startup': 51179520,
'response_received_count': 1,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'splash/render.html/request_count': 1,
'splash/render.html/response_count/200': 1,
'start_time': datetime.datetime(2020, 9, 3, 14, 28, 49, 471597)}
2020-09-03 19:58:59 [scrapy.core.engine] INFO: Spider closed (finished)
please help me fix this