0

Recently, I've been having trouble driving phantomjs under RSelenium. It seems that the browser is unable to locate anything on the page using findElement(). If I pass something as simple as:

library("RSelenium")
RSelenium::checkForServer()
RSelenium::startServer()
rd <- remoteDriver(browserName = "phantomjs")
rd$open()
Sys.sleep(5)

rd$navigate("https://www.Facebook.com")
searchBar <- rd$findElement(using = "id", "email")

I get the error below:

Error:   Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.
class: org.openqa.selenium.NoSuchElementException

Any thoughts on what is causing this? It doesn't seem to matter what page I navigate to; it simply fails anytime I try to locate an element on the webpage. This issue started recently and I noticed it when my cron jobs began failing.

I'm working in Ubuntu 14.04 LTS with R 3.3.1 and phantomjs 2.1.1. I don't suspect some type of compatibility issue as this has worked very recently and I haven't updated anything.

ander2ed
  • 1,318
  • 1
  • 11
  • 19
  • Google was a simple working example. It R throws an error regardless of the site. I can update the question to exclude Google but that seems irrelevant – ander2ed Aug 23 '16 at 02:17
  • Apologies if you got that impression. I was never actually scraping Google for content, and was unaware of the violation. Question updated to exclude Google. – ander2ed Aug 23 '16 at 02:24

1 Answers1

2

The version of phantomjs you installed may be limited. See here

  • Disabled Ghostdriver due to pre-built source-less Selenium blobs.
  • Added README.Debian explaining differences from upstream "phantomjs".

If you installed recently using apt-get then this is most likely the case. You can download from the phantomjs website and place the bin location in your PATH.

Alternatively use npm to install a version for you

npm install phantomjs-prebuilt

This will then but a link to the bin in node_modules/.bin/phantomjs.

For the reasons behind the limitations in apt-get you can read the README.Debian file contained here.

Limitations

Unlike original "phantomjs" binary that is statically linked with modified QT+WebKit, Debian package is built with system libqt5webkit5. Unfortunately the latter do not have webSecurity extensions therefore "--web-security=no" is expected to fail.

https://github.com/ariya/phantomjs/issues/13727#issuecomment-155609276


Ghostdriver is crippled due to removed source-less pre-built blobs:

src/ghostdriver/third_party/webdriver-atoms/*

Therefore all PDF functionality is broken.


PhantomJS cannot run in headless mode (if there is no X server available).

Unfortunately it can not be fixed in Debian. To achieve headless-ness upstream statically link with customised QT + Webkit. We don't want to ship forks of those projects. It would be great to eventually convince upstream to use standard libraries. Meanwhile one can use "xvfb-run" from "xvfb" package:

xvfb-run --server-args="-screen 0 640x480x16" phantomjs

If you don't want to set your path for phantomjs then you can add it as a extra:

library(RSelenium)

selServ <- startServer()
pBin <- list(phantomjs.binary.path = "/home/john/node_modules/phantomjs-prebuilt/lib/phantom/bin/phantomjs")
rd <- remoteDriver(browserName = "phantomjs"
                   , extraCapabilities = pBin)
Sys.sleep(5)
rd$open()

rd$navigate("https://www.Facebook.com")
searchBar <- rd$findElement(using = "id", "email")

rd$close()
selServ$stop()
jdharrison
  • 30,085
  • 4
  • 77
  • 89
  • Appreciate the feedback. Seems like I recall reading about this. What throws me is that it has always worked in the past and I don't recall updating anything. All the same, this seems like a good starting point and I'll look into it later today and report back. – ander2ed Aug 23 '16 at 13:11
  • Top stuff. Added some info on starting phantomjs when bin is not in PATH. – jdharrison Aug 23 '16 at 13:18
  • Well, it did seem to be an issue with the `phantomjs` version. Like I said, my jobs had not failed until rather recently, so perhaps I ran an `apt-get upgrade` or `apt-get dist-upgrade` at some point. I'll have to be more careful with that moving forward. Thanks for the help. – ander2ed Aug 23 '16 at 21:35
  • @ander2ed good to hear. If you are interested in headless browsers you may find firefox/chrome ran from a docker container to be of use. There is a vignette detailing the basics at http://rpubs.com/johndharrison/RSelenium-Docker – jdharrison Aug 23 '16 at 21:38