1

I have been working with scrapy + splash trying to scrape images from different websites. The thing is that some pages load the images dynamically and I can't get them fully loaded and the 'src' attribute is not there.

I started using splash from Scrapy but I switched and used the Splash website to find the problem.

I have tried everything in: https://splash.readthedocs.io/en/latest/faq.html#website-is-not-rendered-correctly but i don't get the images loaded

enter image description here

I found this problem with https://decathlon.es but I don't know if I'll find this problem later.

This is the script that I used to render the page:

function main(splash, args)
  splash.private_mode_enabled = false
  splash.images_enabled = true
  splash:set_user_agent("Different User Agent")
  splash.plugins_enabled = true
  splash.html5_media_enabled = true
  assert(splash:go(args.url))
  assert(splash:wait(3.5))
  width, height = splash:set_viewport_full()
  assert(splash:wait(3.5))

  return {
    html = splash:html(),
    png = splash:png(),
    har = splash:har(),
  }
end

0 Answers0