-1

I have a python scrapy - scraper used to scrape an ecommerce site at the moment it is unable to scrape the brand logo name - and brand logo image url - I dont need to download the actual logo - previous working code is

        l.add_xpath("manufacturer", ".//img[contains(@class, 'product-brand-logo')]/@src")
        l.add_xpath("manufacturer_logo_image_url", ".//img[contains(@class, 'product-brand-logo')]/@src")

The source code for that section from the website is

<div class="product-price-details">
<div class="product-details">
<div class="product-brand-logo visible-xs visible-sm product-brand-logo--flex">
<a href="/vogue/_/a33-1" manual_cm_sp="PDP%20brand%20click-_-Vogue-_-D161">
<img class="product-brand-logo__image " data-src="https://media.testdom.com/asset/en/brand/large/vogue.jpg" alt="Vogue" title="Vogue" />
</a>
<div class="js-tooltip product-brand-logo__tooltip" data-tooltip="true">
<a href="/vogue/_/a33-1" manual_cm_sp="PDP%20brand%20click-_-Vogue-_-D161">
Browse our full Vogue range</a>

Please can someone help correct the previous code which was working until now - to reflect the changes made in the site

When the scraper is running no error is shown - it just doesnt scrape the required data and columns are blank for both

Thanks

jt9489
  • 51
  • 7
  • If you analyze by yourself you will find the problem – Wonka Oct 01 '19 at 14:29
  • Could you please provide a [mcve]? While it can be deduced that `l` is an `item loader`, it's not clear from the way you asked your question. Also, how did you try to debug your code? What have you tried and what failed? You should also read some of the [on-topic](https://stackoverflow.com/help/on-topic) help pages (see `1.`). – Chillie Oct 01 '19 at 14:31
  • Thank you for your kind replies Plz note I am not a scrapy programmer or have any experience coding in scrapy - I am using the scraper only however from time to time if I run into issues I can identify changes in the source code and update the items.py and it fixes it. On this I tired changing in xpath product-brand-logo to product-brand-logo__image as this is the only difference I can see - But this did not scrape the required data or provide any syntax error – jt9489 Oct 01 '19 at 15:17

1 Answers1

1

You can surely scrape manufacturer logo through:

l.add_css('manufacturer_logo_image_url', '.product-details .product-brand-logo .product-brand-logo__image::attr(data-src)')

But I'm not sure why in the previous example manufacturer was scraped from the image src. If you need manufacturer name, get it from the image title:

l.add_css('manufacturer', '.product-details .product-brand-logo .product-brand-logo__image::attr(title)')
sortas
  • 1,527
  • 3
  • 20
  • 29
  • thanks I tried adding the lines but I am getting IndentationError: unexpected indent – jt9489 Oct 01 '19 at 18:59
  • I fixed the error and scraper runs now it is scraping the manufacturer logo image url But not the manufacturer name – jt9489 Oct 01 '19 at 19:09
  • Edited, use the second line :) – sortas Oct 01 '19 at 20:03
  • I added both lines as I need both brand name and logo image url from 2 columns in the export csv thanks – jt9489 Oct 01 '19 at 21:50
  • hi sortas can you please clarify your comment use the second line as i mentioned I have tried it but am I doing something wrong? – jt9489 Oct 03 '19 at 18:44
  • Previously I made a typo, so both lines were collecting 'manufacturer_logo_image_url', so I fixed the second one and now it collects 'manufacturer'. – sortas Oct 03 '19 at 19:30
  • thanks I tried with the updated but it is still not getting the manufacturer only the url is being scraped – jt9489 Oct 04 '19 at 10:33
  • `::attr(title)` this part in the second line should collect img title, and can't collect url by any means. check your code, or check the site, maybe layout has changed. – sortas Oct 04 '19 at 10:43
  • hi sortas the source code is exactly the same and in my code apart from the lines mentioned in the above question I can't see any other ref for logo or get the manufacturer name from the title – jt9489 Oct 04 '19 at 11:43