Scrapy Xpath: Extracting @title from img node

Question

I want to extract the @title from the Main Notes According to Your Votes section from this page: https://www.fragrantica.com/perfume/Remy-Latour/Cigar-9351.html

I have fetched the HTML, then tried this line of code on scrapy shell but the output was None:

response.xpath('//*[@id="userMainNotes"]/div/img/@title).extract_first()

What am I doing wrong?

score 2 · Answer 1 · answered Sep 09 '18 at 10:03

2

If you check source code (Ctrl+U) you'll find:

<div title="96:241;171:117;33:103;34:103;41:70;128:63;4:59;182:59;170:58;75:56;191:48;21:39;77:39;14:28" id="userMainNotes">Loading...</div>

that means that above <div> is rendered by Javascript that's why your code doesn't work.

answered Sep 09 '18 at 10:03

gangabass

thank you gangabass, what efficient methods do you suggest for crawling javascript heavy pages ? – Anh Quoc Vo Sep 10 '18 at 06:20

score 0 · Accepted Answer · answered Sep 09 '18 at 14:01

0

This will work

response.xpath('//span[contains(@id, "note")]/img[@rel]/@title')

Do not forget to set USER AGENT to your settings.py

answered Sep 09 '18 at 14:01

Yash Pokar

Thank you Yash, your code worked. However may I ask you to clarify the logic behind your line of code? Especially what the img[@rel] part stands for ... – Anh Quoc Vo Sep 10 '18 at 06:23
@AnhQuocVo your welcome, sure I can tell you the logic behind it – Yash Pokar Sep 10 '18 at 06:26
You might have written that xpath according to the html node arrangement in chrome/firefox's developer inspect tool. Correct me if I'm wrong. Well It isn't always same as you are getting in your response. because browser has compiled it. which rearrange the nodes where your python lover level request can't do that. – Yash Pokar Sep 10 '18 at 06:47
yes you are correct, I have copied the xpath straight from the inspect tool. For some other elements such as the name of the product this has worked ... but not for the perfume notes. I have chosen your answer as the solution to this question however my reputation is low thus I cannot upvote. – Anh Quoc Vo Sep 10 '18 at 06:50
Now if you want to write 100% correct xpath then you have to save the response and write xpath according to that response body. you could refer to this https://medium.com/@yashpokar/scrape-any-website-in-the-internet-without-using-splash-or-selenium-68a6c9733369 – Yash Pokar Sep 10 '18 at 06:53

2 Answers2