Scrapy Response 204 No-Content

Question

I have a scrapy spider which was working as expected for a while, but now returning empty response.

class BossSpider(scrapy.Spider):
    name = 'bossaz'
    allowed_domains = ['boss.az']
    start_urls = ['https://boss.az/vacancies']

    def parse(self, response):
        for href in response.xpath('//a[@class="results-i-link"]/@href'):
            yield response.follow(href, self.parse_jobs)

        next_page = response.xpath('//span[@class="next"]/a[@rel="next"]/@href').extract_first()
        if next_page is not None:
            yield response.follow(next_page, callback=self.parse)

    def parse_jobs(self, response):
        scraped_data = dict()
        scraped_data['job_title'] = response.xpath('//h1[@class="post-title"]/text()').extract_first()
        scraped_data['employer'] = response.xpath('//a[@class="post-company"]/text()').extract_first()
        scraped_data['published'] = response.xpath('//div[@class="bumped_on params-i-val"]/text()').extract_first()
        scraped_data['details'] = response.xpath('//div[@class="post-cols post-info"]').extract()
        yield scraped_data

Right now above code returns the stats below when I run spider in my machine:

{'downloader/request_bytes': 431,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 304,
 'downloader/response_count': 2,
 'downloader/response_status_count/204': 2,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 8, 30, 5, 30, 18, 860994),
 'log_count/DEBUG': 3,
 'log_count/INFO': 7,
 'memusage/max': 53403648,
 'memusage/startup': 53403648,
 'response_received_count': 2,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2018, 8, 30, 5, 30, 17, 554091)}

I also tried to get result in terminal by typing scrapy shell https://boss.az/vacancies. In terminal, response.body also returns empty string. Note that, I checked the website's HTML code and there is no structural change. What can be reason for this spider to return HTTP status 204?

it is working for me, I'm using USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36" in settings — Hassan Raza, Aug 30 '18 at 10:06
is is working now after I added that line to settings file. But still don't understand the reason. I have other spiders in the same project and they are working properly without changing USER_AGENT. Anyway, thanks for the help! — Elgin Cahangirov, Aug 30 '18 at 10:28

Scrapy Response 204 No-Content

0 Answers0