0

I was scraping job applications from below portal using scrapy. But i get only 10 items in scrapy shell for the class which shows 15 items in developer tools and via selector gadget. I am confused about this difference.

Page tested on: https://www.waahjobs.com/s/software-developer-jobs-in-mumbai/

Class selected using Selector Gadget Extension: .r-95jzfe .css-1dbjc4n .r-1pn2ns4

Number of items: 15 (also counted manually.)

Scrapy shell input:

scrapy shell "https://www.waahjobs.com/s/software-developer-jobs-in-mumbai/"

obj = response.css(".r-95jzfe .css-1dbjc4n .r-1pn2ns4") print(len(obj))

Scrapy shell output: 10

Expected output: 15

Update: Bypassed the need to scrape data by directly hitting backend. Useful link to convert curl request to Scrapy code - https://michael-shub.github.io/curl2scrapy/

But still facing problem on some websites even after using scrapy-splash.

What i did:

  1. Integrated scrapy with splash
  2. Started splash on localhost using docker
  3. then performed command fetch('http://localhost:8050/render.html?url=https://www.hirist.com/login') on scrapy terminal.

Result: view(response) gives 404 on chrome

Expectation: https://quotes.toscrape.com/ works but https://www.hirist.com doesn't.

As you can see in this image that splash is not able to load the page. HTML is also not readable. HTML Contains correct data though

Kindly help.

  • https://www.waahjobs.com/s/software-developer-jobs-in-banglore/ Similar thing happens with this link. I get 10 Cards using Scrapy but actually there are 20 cards on the website. – Avinash Nagar Jun 04 '21 at 05:39
  • Welcome to SO! This is because the site uses javascript to load the data. The scrapy docs has suggestions for how to deal with that [here](https://docs.scrapy.org/en/latest/topics/dynamic-content.html). You'll also find suggestions on this site too. – tomjn Jun 04 '21 at 08:18
  • Thank You. I realised the same upon viewing the page as Scrapy using view(response). – Avinash Nagar Jun 05 '21 at 07:36
  • Update: I bypassed the need to scrape the data by directly requesting backend for data. Below is a useful link to convert curl request to Scrapy code: https://michael-shub.github.io/curl2scrapy/ However i am still facing problem on some websites even after using scrapy-splash. What i did: 1. Integrated scrapy with splash 2. Started splash on localhost using docker – Avinash Nagar Jun 05 '21 at 11:01

0 Answers0