Questions tagged [google-cloud-composer]

Google Cloud Composer is a fully managed workflow orchestration service, built on Apache Airflow, that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.

Cloud Composer is a product of Google Cloud Platform (GCP). It is essentially "hosted/managed Apache Airflow."

The product allows you to create, schedule, and monitor jobs, each job being represented as a DAG (directed acyclic graph) of various operators. You can use Airflow's built-in operator definitions and/or or define your own in pure Python.

While technically you can do data processing directly within a task (instantiated operator), more often you will want a task to invoke some sort of processing in another system (which could be anything - a container, BigQuery, Spark, etc). Often you will then wait for that processing to complete using an Airflow sensor operator, possibly launch further dependent tasks, etc.

While Cloud Composer is managed, you can apply a variety of customizations, such as specifying which pip modules to install, hardware configurations, environment variables, etc. Cloud Composer allows overriding some but not all Airflow configuration settings.

Further technical details: Cloud Composer will create a Kubernetes cluster for each Airflow environment you create. This is where your tasks will be run, but you don't have to manage it. You will place your code within a specific bucket in Cloud Storage, and Cloud Composer will sync it from there.

1225 questions
3
votes
2 answers

GCP Apache Airflow - How to install Python package from a private repository and import on DAG?

I have a private repository. This repository has my common functions about my DAG. (for example: datetime validaters, response encoder function) I want to import this repository's functions on my DAG file and I used this link to do it. I created…
3
votes
1 answer

Create Cloud Composer environment with a private repository using Terraform

I'm trying to create a Cloud Composer environment with a PyPI package from a private repository using Terraform. Cloud Composer supports private PyPI repositories. However, configuring a private repository requires an existing composer bucket. When…
ollik1
  • 4,460
  • 1
  • 9
  • 20
3
votes
1 answer

Cloud Composer - DAG Task Log File is not Found

Since a few days ago some tasks throw an error at the start of every DAG run. It seems the log file is not found to retrieve the logging from the task. *** 404 GET…
3
votes
1 answer

Cloud Composer - DAG error: java.lang.ClassNotFoundException: Failed to find data source: bigquery

I'm trying to execute a DAG which create a Dataproc Cluster at Cloud Composer. But It fails when trying to save on Big Query. I suppose that is missing a jar file ( --jars gs://spark-lib/bigquery/spark-bigquery-latest_2.12.jar) but I don't know how…
3
votes
5 answers

Receiving HTTP 401 when accessing Cloud Composer's Airflow Rest API

I am trying to invoke Airflow 2.0's Stable REST API from Cloud Composer Version 1 via a Python script and encountered a HTTP 401 error while referring to Triggering DAGS with Cloud Functions and Access the Airflow REST API. The service account has…
3
votes
1 answer

How to add a SSL postgres connection using CLI for Cloud Composer?

using airflow locally, I was able to add a SSL Postgres connection using this : ./airflow.sh connections add connection_name --conn-uri 'postgres://user:@host:port/db?sslmode=verify-ca&sslcert=<>.crt&sslca=<>.crt&sslkey=<>.key.pk8' Now I'm using…
3
votes
1 answer

I keep getting '[Errno 101] Connection timed out' when running python script on Airflow

I keep getting '[Errno 101] Connection timed out' when I tried to call an API using Python on Airflow that provided by Google Cloud Composer. Here's my code: r1 = requests.post( url1, params=request1_params, …
3
votes
2 answers

Get the client_id of the IAM proxy on GCP Cloud composer

I'm trying to trigger Airflow DAG inside of a composer environment with cloud functions. In order to do that I need to get the client id as described here. I've tried with curl command but it doesn't return any value. With a python script I keep…
3
votes
0 answers

Handling Airflow DAG changes through time (DAG Versioning)

We have relatively complex dynamic DAG as part of our ETL. DAG contains hundreds of transformations and it is created programmatically based on set of yaml files. It is changed through time: new tasks are added, queries executed by tasks are changed…
partlov
  • 13,789
  • 6
  • 63
  • 82
3
votes
0 answers

Airflow: Why do DAG tasks run outdated DAG code?

I am running Airflow (1.10.9) through Cloud Composer (1.11.1) on GCP. Whenever I update a DAG's code I can see the updated code refreshed in the Airflow GUI but for at least 10 minutes the DAG's tasks still run the old code. A couple of…
AYR
  • 1,139
  • 3
  • 14
  • 24
3
votes
0 answers

How to install extra linux package in GCP Composer?

I have written a dag which uses mongoexport command in BashOperator. By default mongoexport package is not installed in composer. I will need to install it using below command: sudo apt install mongo-tools We can directly install PyPi packages in…
Joseph D
  • 189
  • 1
  • 12
3
votes
2 answers

GCP Cloud Composer: get_client_id.py error with required arguments

I have a question about GCP Cloud Composer. To verify the function that triggers DAG (workflow) I would like to get the client ID by referring to the python code in the following…
3
votes
2 answers

How to run cloud composer task which loads data into other project BigQuery Table

I have my cloud composer environment created under project-A and I want to load data into other project-B BigQuery table. I know about task its GCSToBigQueryOperator but its not succeeding its failing, I want to know how can I achieve this. From…
user3065757
  • 475
  • 1
  • 5
  • 14
3
votes
0 answers

Is it feasible to use the Vertical Pod Autoscaler with Airflow on a task level?

I currently use Airflow (via Cloud Composer) with the Celery Executor and the KubernetesPodOperator. One challenge I have is to use resources efficiently when some Airflow tasks use relatively little memory and others use many GB of memory. It would…
RayB
  • 2,096
  • 3
  • 24
  • 42
3
votes
0 answers

Retrieve and pass the result of an Airflow SSHOperator task to another task?

I need to retrieve the output of a bash command (which will be the size of a file), in a SSHOperator. I will use this value as a condition check to branch out to other tasks. I'm using xcom to try retrieving the value and branchpythonoperator to…
comet
  • 65
  • 2
  • 8