I use Scrapy + Splash to scrape data from news sites. For one of them though (Farhikhtegan Daily), while I have managed to render text/image-only pages, but those pages containing a specific type of video cannot be rendered due to an error ("webkit203").
Details:
The lua-script that I usually expect to work is:
function main(splash, args)
assert(splash:go(args.url))
assert(splash:wait(2))
return splash:html()
end
For instance this page from the above-mentioned site renders perfectly with this script, but this one (which contains that specific type of video element) doesn't, and I get this error:
{ "error": 400, "type": "ScriptError", "description": "Error happened while executing Lua script", "info": { "source": "[string "function main(splash, args)\r..."]", "line_number": 2, "error": "webkit203", "type": "LUA_ERROR", "message": "Lua error: [string "function main(splash, args)\r..."]:2: webkit203" } }
I have tried many variations of the script, including enabling/disabling these Splash attributes:
js_enabled
images_enabled
webgl_enabled
html5_media_enabled
media_source_enabled
plugins_enabled
private_mode_enabled
also desperately tried deleting the video element altogether by adding a js function to the script as follows:
function main(splash, args)
local rm = splash:jsfunc([[
function () {
var vids = document.getElementsByClassName('videodiv');
vids[0].remove();
return true;
}
]])
assert(splash:go(args.url))
assert(splash:wait(2))
return splash:html(),
}
end
which of course didn't make a change. All the above efforts lead to the same error quoted above.
So, is there anything that I might have missed?
Thanks for your help.