Questions tagged [google-cloud-composer]

Google Cloud Composer is a fully managed workflow orchestration service, built on Apache Airflow, that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.

Cloud Composer is a product of Google Cloud Platform (GCP). It is essentially "hosted/managed Apache Airflow."

The product allows you to create, schedule, and monitor jobs, each job being represented as a DAG (directed acyclic graph) of various operators. You can use Airflow's built-in operator definitions and/or or define your own in pure Python.

While technically you can do data processing directly within a task (instantiated operator), more often you will want a task to invoke some sort of processing in another system (which could be anything - a container, BigQuery, Spark, etc). Often you will then wait for that processing to complete using an Airflow sensor operator, possibly launch further dependent tasks, etc.

While Cloud Composer is managed, you can apply a variety of customizations, such as specifying which pip modules to install, hardware configurations, environment variables, etc. Cloud Composer allows overriding some but not all Airflow configuration settings.

Further technical details: Cloud Composer will create a Kubernetes cluster for each Airflow environment you create. This is where your tasks will be run, but you don't have to manage it. You will place your code within a specific bucket in Cloud Storage, and Cloud Composer will sync it from there.

1225 questions
3
votes
1 answer

Google Cloud Composer - Deploying Docker Image

Definitely missing something, and could use some quick assistance! Simply, how do you deploy a Docker image to an Airflow DAG for running jobs? Does anyone have a simple example of deploying a Google container and running it via Airflow/Composer?
3
votes
3 answers

Is it possible to have a pipeline in Airflow that does not tie to any schedule?

I need to have pipeline that will be executed either manually or programmatically, is possible with Airflow? Looks like right now each workflow MUST be tied to a schedule.
3
votes
1 answer

How to update DAGs in Google Cloud Composer

I want to automate the deployment of DAGs written in a certain repository. To achieve that I use the gcloud tool and this just imports the DAGs and Plugins according to the documentation. Now the problem is that when changing the structure of a DAG…
Mohammed Ajil
  • 65
  • 1
  • 7
2
votes
0 answers

Pulling XCOM when creating a dataproc cluster with DataprocCreateClusterOperator in airflow

I'm trying to pull an XCOM variable when creating a dataproc cluster in airflow with DataprocCreateClusterOperator. I need it because I need a variable that I am pulling by querying an API and saving as XCOM variable in the preceding step. What I'm…
2
votes
1 answer

Where is the actual documentation for the composer python SDK?

Am I crazy or is this documentation non-existent: https://cloud.google.com/python/docs/reference/composer/latest "Code samples and snippets live in the samples/ folder." What samples folder? The repo also has nothing:…
red888
  • 27,709
  • 55
  • 204
  • 392
2
votes
1 answer

Trigger the Cloud composer dag run manually from java code using a client

I have a cloud composer dag which has scheduler property set to none and needs to triggered. I have uploaded my dag code to the cloud compose gcs folder and tried to trigger the from my local using my local gcloud credentials as suggested in the…
2
votes
1 answer

variables by default in airflow and BigQueryToGCSOperator

I have an airflow DAG with the following task. At runtime, there is an error because there are special characters in the job_id. How can I correctly pass the variables to the task definition? date='{{ ds }}' job_id=date+'{{ ts…
franco pina
  • 193
  • 1
  • 1
  • 14
2
votes
0 answers

Sendgrid HTTP: 400 error using Cloud Composer

I'm trying to set up an Airflow DAG that is able to send emails through the EmailOperator in Composer 2, Airflow 2.3.4. I've followed this guide. I tried running the example DAG that is provided in the guide, but I get an HTTP 400 error. The log…
ramoniazzz
  • 43
  • 2
2
votes
2 answers

Cloud Composer v2 API Service Agent Extension role might be missing

I am trying to spin up GCP Cloud Composer using the below set of terraform script code base: resource "google_composer_environment" "test" { name = "example-composer-env-tf-c2" region = "us-central1" config { software_config { …
2
votes
0 answers

Airflow 2.1.4 Composer V2 GKE kubernetes in custom VPC Subnet returning 404

So I have two V2 Composers running in the same project, the only difference in these two is that in one of them I'm using the default subnet and default values/autogenerated values for cluster-ipv4-cidr & services-ipv4-cidr. In the other one I've…
2
votes
1 answer

Command to retrieve SMTP Password by airflow

How to retrieve SMTP password by airflow using smtp_password_cmd value? I have added all the environment variables in composer environment which intern overrides airflow configuration. https://i.stack.imgur.com/n00HE.jpg (Refer this image) Please…
2
votes
1 answer

How to run docker image inside GCP Compute Engine instance with Apache Airflow

I am trying to create an Airflow DAG from which I want to spin a Compute Engine instance with a docker image stored in Google Container Registry. In other words, I wanted to replicate gcloud compute instances create-with-container with airflow dags…
2
votes
1 answer

Cloud composer writes file and file that disappears after dag execution

I was trying to write a txt file to the dags folder in a cloud composer DAG. The file was never showing up and I thought there was something wrong with my code, but I tried saving a pandas dataframe in xlsx format do the DAGs folder and load that…
2
votes
1 answer

Airflow schedule interval stuck at '1 day, 0:00:0'

I need to change the time my DAG runs at to be midday. I followed the advice from various questions on here and deleted the DAG and uploaded a renamed version with a new dag_id. Even though the original DAG was renamed from, say, dag_1.py to…
CClarke
  • 503
  • 7
  • 18
2
votes
1 answer

Google Cloud Dataproc cluster creation failure due to Cloud SQL proxy initialization error

I am trying to create a Dataproc cluster from a Cloud Composer DAG using the DataprocCreateClusterOperator of Airflow. I need to access Cloud SQL from my Dataproc cluster hence need to install the Cloud SQL proxy on the cluster as well. I am…