2

I'm trying to run a DAG in Google Cloud Composer in which the first component is to use a http GET request to call an API and then use the python-client library to insert the json into a BigQuery table. I am trying to run this function: https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.insert_rows_json.html

import requests
import datetime
import ast
import numpy as np
from airflow import models
from airflow.contrib.operators import bigquery_operator
from airflow.operators import python_operator
import google.cloud.bigquery as bigquery

client = bigquery.Client(project = 'is-flagship-data-api-sand')
dataset_id = 'Mobile_Data_Test'
dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table('sample_wed')
table = client.get_table(table_ref)

def get_localytics_data():
    profiles_requests_command = "https://%s:%s@api.localytics.com/v1/exports/profiles/%d/profile"%(api_key,api_secret,28761)
    res_profiles = requests.get(profiles_requests_command)
    if res_profiles.status_code == 200:
        data = res_profiles.content
        data_split = data.split('\n')[:-1]
        data_split_ast = [ast.literal_eval(x) for x in data_split]

        #take out characters from the beginning to have neat columns
        data_split_ast_pretty = [dict(zip(map(lambda x: x[4:], item.keys()), item.values())) for item in data_split_ast]


        #add current date
        current_time = datetime.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S")
        for item in data_split_ast_pretty:
            item['DateCreated'] = current_time


        random_sample = list(np.random.choice(data_split_ast_pretty,5))  
        print random_sample
        client.insert_rows_json(table = table, json_rows = random_sample)
    else:
        pass




run_api = python_operator.PythonOperator(task_id='call_api',
        python_callable=get_localytics_data)

I added the PYPI Packages of :

requests ===2.19.1

numpy ===1.12.0

google-cloud-bigquery ===1.4.0

I get the error of : Broken DAG: [/home/airflow/gcs/dags/composer_test_july30_v2.py] 'Client' object has no attribute 'get_table' in the Airflow UI Console.

All the code shown works locally but will not work using Cloud Composer.

Ismail
  • 1,068
  • 1
  • 6
  • 11

2 Answers2

0

Sounds like you've got an out of date google-cloud-bigquery package, although it appears it should not be.

To check for sure, it will require SSH'ing into the Composer environment's Google Kubernetes Engine (GKE) clusters and running pip freeze | grep bigquery to find out what actual version is installed.

  1. Go to https://console.cloud.google.com/kubernetes/list
  2. Find the corresponding GKE cluster and click on it.
  3. Click connect in the top.
  4. Once in the console type kubectl get pods. A list of pods should appear.
  5. Enter kubectl exec -it <AIRFLOW_WORKER> /bin/bash where is once of the pods that start with airflow-worker-*.
  6. Once inside the pod, type pip freeze | grep bigquery, and it should show the version of the module.
cjmoberg
  • 106
  • 5
  • By using googleapiclient.discovery https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/python/latest/bigquery_v2.tabledata.html , insertAll streaming API I was able to get the DAG to load properly in Airflow. – Kenzie Tahiri Jul 31 '18 at 20:11
0

Google now publishes the Python packages/versions on the Composer images page https://cloud.google.com/composer/docs/concepts/versioning/composer-versions Just find the row for the version you are using, then expand the Packages column.

This is now by far the easiest way to get the package version because pip freeze on an Airflow worker now generates a list that cites the wheel file location rather than the version number. For example:

airflow@airflow-worker-*****:~$ pip freeze
absl-py @ file:///usr/local/lib/airflow-pypi-dependencies-2.1.4/python3.8/absl_py-1.0.0-py3-none-any.whl
alembic @ file:///usr/local/lib/airflow-pypi-dependencies-2.1.4/python3.8/alembic-1.7.1-py3-none-any.whl
amqp @ file:///usr/local/lib/airflow-pypi-dependencies-2.1.4/python3.8/amqp-2.6.1-py2.py3-none-any.whl
anyio @ file:///usr/local/lib/airflow-pypi-dependencies-2.1.4/python3.8/anyio-3.3.1-py3-none-any.whl
etc....
ChrisFal
  • 181
  • 6