1

I am trying to run a Lua script in Splash to perform a Google search and take the screenshot of search results. When I try to select the Google search box using xpath or css selector in my Lua script I get this error:

{
    "error": 400,
    "type": "ScriptError",
    "description": "Error happened while executing Lua script",
    "info": {
        "message": "[string \"function main(splash, args)\r...\"]:9: cannot select the specified element {'type': 'JS_ERROR', 'js_error_type': 'SyntaxError', 'js_error_message': 'SyntaxError: DOM Exception 12', 'js_error': 'Error: SyntaxError: DOM Exception 12', 'message': \"JS error: 'Error: SyntaxError: DOM Exception 12'\"}",
        "type": "SPLASH_LUA_ERROR",
        "splash_method": "select",
        "source": "[string \"function main(splash, args)\r...\"]",
        "line_number": 9,
        "error": "cannot select the specified element {'type': 'JS_ERROR', 'js_error_type': 'SyntaxError', 'js_error_message': 'SyntaxError: DOM Exception 12', 'js_error': 'Error: SyntaxError: DOM Exception 12', 'message': \"JS error: 'Error: SyntaxError: DOM Exception 12'\"}"
    }
}

This is my Lua script :

function main(splash, args)

  splash.private_mode_enabled = false
  splash:set_user_agent("Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0")
  
  assert(splash:go(args.url))
  assert(splash:wait(1.0))

  search_box = assert(splash:select("//div[@class='a4bIc']/input"))
  search_box:focus()
  search_box:send_text('my user agent')
  search_box:send_keys('<Enter>')
  assert(splash:wait(2.0))
  
  return splash:png()
end

I tried to set custom headers, run the script in private mode but nothing works. However, the same script runs without error and with correct output when using duckduckgo.com. The problem comes when target URL is google.com. I think google detects that the browser is being controlled by a bot(script) so it disables access to DOM tree.

Any idea how to make it work?

Hades
  • 11
  • 1
  • Perhaps, you should check whether `args.url` is fetched at all; and that is is not a captcha. Google may analyze User-agent or recognise bots some other way. – Alexander Mashin Oct 11 '20 at 18:55
  • Yes @AlexanderMashin the `args.url` is being fetched. When I comment the lines `9-12` from my code, the remaining code works as expected - it just returns the screenshot of google homepage. This means that the problem is coming in accessing the DOM tree. – Hades Oct 12 '20 at 06:23

2 Answers2

0

There's something wrong with your selector.

"//div[@class='a4bIc']/input"

Open the webpage, tap F12 and then use the inspector to find out what div class to target for that input field. It's also possible that their classname is being generated on the fly to obfuscate it.

Doyousketch2
  • 2,060
  • 1
  • 11
  • 11
  • I checked and cross verified using the inspector tools, but unfortunately the selector is correct. I even copied the full xpath/css selector from inspector tool and used that in my script, but got the same error. – Hades Oct 22 '20 at 14:36
0

Maybe the page hasn't fully downloaded / rendered yet

function main(splash, args)
    splash.private_mode_enabled = false
    splash:set_user_agent("Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0")

    local ok, reason = assert( splash:go(args.url) )

    if ok then
        local wait, increment, maxwait = 0, 0.1, 10
        while wait < maxwait and not splash:select("//div[@class='a4bIc']/input") do
            splash:wait(increment)  --  wait until it exists, or times out
            wait = wait +increment
        end
        if wait >= maxwait then
            print('Timed out')
        else
            search_box = splash:select("//div[@class='a4bIc']/input")
            search_box:focus()
            search_box:send_text('my user agent')
            search_box:send_keys('<Enter>')
            splash:wait(2.0)
            return splash:png()
        end
    else
        print( reason )  --  see if it tells you why
    end
end
Doyousketch2
  • 2,060
  • 1
  • 11
  • 11
  • I executed the above script, but unfortunately got the same error again - line 9 : cannot select the specified element. This means that the page is being downloaded / rendered. – Hades Oct 23 '20 at 07:29
  • 1
    try `splash:select( 'div.a4bIc input.gLFyf.gsfi' )` that's how their CSS selector is showing up on my browser – Doyousketch2 Oct 23 '20 at 07:45
  • Awesome! It works now. Thanks. But I still don't understand why it didn't work with the xpath selector. Try to copy paste this xpath `//div[@class='a4bIc']/input` in your browser inspector tool. And let me know if this selects the same element. If yes then what could be the reason this doesn't work in the lua script? – Hades Oct 24 '20 at 08:10
  • not sure, was just a hunch 'cuz that's how scripts are in Stylus as well. `add0n.com/stylus.html` – Doyousketch2 Oct 24 '20 at 08:29
  • ...which is what I said in the first place - https://stackoverflow.com/a/64461369/3342050 – Doyousketch2 Oct 24 '20 at 08:46