1

I'm quite new to Splash and tho I was able to get Splash setup on my Ubuntu 18 (via Splash/Docker) it gives me different results for this page: https://www.overstock.com/Home-Garden/Area-Rugs/31446/subcat.html

Normally it's rendered like so: enter image description here

But when I try to render it in Splash, it renders it like this: enter image description here

I have tried changing the user agent in Splash to this:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36

Consequently, this makes the Splash script like so:

function main(splash, args)
  splash:set_user_agent(
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36'
  )
  assert(splash:go(args.url))
  assert(splash:wait(0.5))
  return {
    html = splash:html(),
    png = splash:png(),
    har = splash:har(),
  }
end

Yet, despite these additions, it still fails to render the page.

How can I get Splash to render this page?

Gallaecio
  • 3,620
  • 2
  • 25
  • 64
rom
  • 666
  • 2
  • 9
  • 31
  • the docker splash is pretty outdated if you want a quick solution go with scrapy-selenium. if you want splash you have to install it manually and modify it – wishmaster Aug 28 '20 at 00:55
  • @wishmaster, Do you mean putting scrapy-selenium into the same docker as Splash? – rom Aug 28 '20 at 19:25
  • no need for splash at all if you go with selenium (scrapy-selenium) – wishmaster Aug 28 '20 at 19:57

1 Answers1

0

It seems like overstock.com requires a Connection and Accept headers. Add it to your request and it should work as expected. Tested on Postman, with and without the Connection: keep-alive && Accept: */* headers; I get the same error page:

enter image description here

After adding the two headers above:

enter image description here

Therefor your request should be edited accordingly:

function main(splash, args)
  splash:set_custom_headers({
     ["Connection"] = "keep-alive",
     ["Accept"] = "*/*",
  })
  assert(splash:go(args.url))
  assert(splash:wait(0.5))
  return {
    html = splash:html(),
    png = splash:png(),
    har = splash:har(),
  }
end
noamyg
  • 2,747
  • 1
  • 23
  • 44
  • Still got same result after using this. – rom Sep 02 '20 at 22:06
  • @rom still my guess is that you're missing a request header. Try working with an API development platform such as Postman to replicate the issue you're getting outside of Splash and play around with the headers. – noamyg Sep 03 '20 at 05:55