3

I am attempting to run a DAG that will use selenium to scrape a web page every week on Cloud Composer.

I have already tried to give the path when creating the WebDriver.Chrome() instance to a driver that I uploaded to GCS, though I imagine this is not the best way to do this.

Airflow is giving this error

Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

If you have any tips as to updating Cloud Composer's PATH variable, would be greatly appreciated. If I need to put in more info, drop a comment and I'll add on.

2 Answers2

0

So there was no official answer and the Slack channel for neither Composer nor GKE were able to help. The real problem was that the binaries were not on Composer. Best answer for right now is to manually ssh into all of your GKE airflow-workers and install Google Chrome yourself: https://linuxize.com/post/how-to-install-google-chrome-web-browser-on-ubuntu-18-04/

Then place the chromedriver for the correct version of Chrome you installed in your dags/dependencies folder and reference it on instantiation of your Webdriver object. Hope this helps!

0

You can create a Docker File and give mention the command to install chrome in the Docker File. Or else as mentioned by Alex you can manually install chrome on all worker nodes.

  1. Follow this tutorial to connect to your worker nodes using Cloud shell- https://towardsdatascience.com/connect-airflow-worker-gcp-e79690f3ecea

  2. Once inside the worker Run the following to install Chrome-

sudo apt-get update

wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb

sudo dpkg -i google-chrome-stable_current_amd64.deb

If you get some dependency error then run the below command and again run the install command

sudo apt --fix-broken install

To check the chrome installation run -

google-chrome --version

And now check where the chrome binary is installed

which google-chrome-stable

Copy this path and put it in the Selenium options in binary_location

options = webdriver.ChromeOptions()
options.binary_location= '/usr/bin/google-chrome-stable'
browser = webdriver.Chrome(ChromeDriverManager().install(),chrome_options=options);

If you are looking for Chrome Driver you can install it on the go while creating the webdriver object shown above