0

I use Scrapy + Splash to scrape data from news sites. For one of them though (Farhikhtegan Daily), while I have managed to render text/image-only pages, but those pages containing a specific type of video cannot be rendered due to an error ("webkit203").

Details:

The lua-script that I usually expect to work is:

function main(splash, args)
  assert(splash:go(args.url))
  assert(splash:wait(2))
  return splash:html()
end

For instance this page from the above-mentioned site renders perfectly with this script, but this one (which contains that specific type of video element) doesn't, and I get this error:

{ "error": 400, "type": "ScriptError", "description": "Error happened while executing Lua script", "info": { "source": "[string "function main(splash, args)\r..."]", "line_number": 2, "error": "webkit203", "type": "LUA_ERROR", "message": "Lua error: [string "function main(splash, args)\r..."]:2: webkit203" } }

I have tried many variations of the script, including enabling/disabling these Splash attributes:

js_enabled images_enabled webgl_enabled html5_media_enabled media_source_enabled plugins_enabled private_mode_enabled

also desperately tried deleting the video element altogether by adding a js function to the script as follows:

function main(splash, args)
  local rm = splash:jsfunc([[
    function () {
      var vids = document.getElementsByClassName('videodiv');
      vids[0].remove();
      return true;
    }
  ]])
  assert(splash:go(args.url))
  assert(splash:wait(2))
  return splash:html(),
  }
end

which of course didn't make a change. All the above efforts lead to the same error quoted above.

So, is there anything that I might have missed?

Thanks for your help.

Roozbeh
  • 11
  • 2
  • 2

0 Answers0