3

I'm trying to build up Airflow DAGs that read data from (or write data to) some Google spread sheets. Among the connections in Airflow I've saved a connection of type "Google Cloud Platform" which includes project_id, scopes and on "Keyfile JSON", a dictionary with "type","project_id","private_key_id","private_key","client_email","client_id", "auth_uri","token_uri","auth_provider_x509_cert_url","client_x509_cert_url"

I can connect to the Google Spread Sheet using

cred_dict = ... same as what I saved in Keyfile JSON ...
creds = ServiceAccountCredentials.from_json_keyfile_dict(cred_dict,scope)
client = gspread.authorize(creds)
sheet = client.open(myfile).worksheet(mysheet) # works!

But I would prefer to not write explicitly the key in the code and, instead, import it from Airflow connections.

I'd like to know if there is a solution of the like of

from airflow.hooks.some_hook import get_the_keyfile
conn_id = my_saved_gcp_connection
cred_dict = get_the_keyfile(gcp_conn_id=conn_id)
creds = ServiceAccountCredentials.from_json_keyfile_dict(cred_dict,scope)
client = gspread.authorize(creds)
sheet = client.open(myfile).worksheet(mysheet)

I see there are several hooks to GCP connections https://airflow.apache.org/howto/connection/gcp.html but my little knowledge makes me failing in understanding which one to use and which function (if any) extract the keyfile from the saved connection.

Any suggestion would be greatly welcomed :)

Carlo
  • 395
  • 1
  • 8
  • 22

2 Answers2

2

Below is the code I'm using to connect to gspread sheets from Airflow using a stored connection.

import json
import gspread
from oauth2client.service_account import ServiceAccountCredentials
from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook

def get_cred_dict(conn_id='my_google_connection'):
    gcp_hook = GoogleCloudBaseHook(gcp_conn_id=conn_id)
    return json.loads(gcp_hook._get_field('keyfile_dict'))

def get_client(conn_id='my_google_connection'):
    cred_dict = get_cred_dict(conn_id)
    creds = ServiceAccountCredentials.from_json_keyfile_dict(cred_dict, scope)
    client = gspread.authorize(creds)
    return client

def get_sheet(doc_name, sheet_name):
    client = get_client()
    sheet = client.open(doc_name).worksheet(sheet_name)
    return sheet
Carlo
  • 395
  • 1
  • 8
  • 22
0

With Airflow 2.5.1 (year 2023) the following code works too.

from airflow.providers.google.common.hooks.base_google import GoogleBaseHook
import gspread

# Create a hook object
# When using the google_cloud_default we can use 
# hook = GoogleBaseHook()
# Or for a deligate use: GoogleBaseHook(delegate_to='foo@bar.com')
hook = GoogleBaseHook(gcp_conn_id='my_google_cloud_conn_id') 

# Get the credentials
credentials = hook.get_credentials()

# Optional, set the delegate email if needed later. 
# You need a domain wide delegate service account to use this.
credentials = credentials.with_subject('foo@bar.com')

# Use the credentials to authenticate the gspread client
gc = gspread.Client(auth=credentials)

# Create Spreadsheet
gc.create('Yabadabadoooooooo') # Optional use folder_id=
gc.list_spreadsheet_files()

Wxll

Resources:

Wxll
  • 316
  • 3
  • 5