Questions tagged [google-cloud-composer]

Google Cloud Composer is a fully managed workflow orchestration service, built on Apache Airflow, that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.

Cloud Composer is a product of Google Cloud Platform (GCP). It is essentially "hosted/managed Apache Airflow."

The product allows you to create, schedule, and monitor jobs, each job being represented as a DAG (directed acyclic graph) of various operators. You can use Airflow's built-in operator definitions and/or or define your own in pure Python.

While technically you can do data processing directly within a task (instantiated operator), more often you will want a task to invoke some sort of processing in another system (which could be anything - a container, BigQuery, Spark, etc). Often you will then wait for that processing to complete using an Airflow sensor operator, possibly launch further dependent tasks, etc.

While Cloud Composer is managed, you can apply a variety of customizations, such as specifying which pip modules to install, hardware configurations, environment variables, etc. Cloud Composer allows overriding some but not all Airflow configuration settings.

Further technical details: Cloud Composer will create a Kubernetes cluster for each Airflow environment you create. This is where your tasks will be run, but you don't have to manage it. You will place your code within a specific bucket in Cloud Storage, and Cloud Composer will sync it from there.

1225 questions
3
votes
1 answer

GCP Composer Airflow tasks stuck or frozen

I am setting up some ETL pipelines on Googles Composer Airflow, deployed on a 3 node GKE. The minimum for Airflow Compose from GCP! Version : 1.10.1-composer GCP Image version: composer-1.6.0-airflow-1.10.1 I would normally log into airflow machine…
3
votes
0 answers

Why am I seeing the following warning message from google cloud composer even when my variables have update

I am creating a composer environment and added two custom environmental variables gcs_source_bucket and gcs_dest_bucket and I get the following warning. I checked and the variables have been updated and I am a bit perplexed as to what the warning is…
George Udosen
  • 906
  • 1
  • 13
  • 28
3
votes
1 answer

How to trigger cloud composer DAG using cloud function when a file is added to cloud storage using Python 3.7

Every time a document is placed in a certain bucket I want to start a DAG workflow to analyse this document. I need to trigger the DAG work flow using cloud functions using Cloud storage trigger and event type Finalize and create
3
votes
1 answer

Using Cloud Functions as operators in a GC Composer DAG

Fellow coders, For a project I'm interested in using Google Cloud Composer to handle several workflows that consist of operations that can be shared between workflows. It seems to me that Cloud Functions are a perfect way of performing tasks as…
3
votes
1 answer

Authenticate Google Composer http call task with IAP protected app

I have a setup where I have an app engine REST application and a Google composer / airflow DAG that has a task where it is supposed to fetch data from one of the endpoints of the app. The app is protected by IAP. I have added the service account…
Vee6
  • 1,527
  • 3
  • 21
  • 40
3
votes
3 answers

How can you run `kubectl -f apply ` from a DAG using bash operator in Cloud Composer?

Im trying to apply a config file to create a POD from Cloud Compose using the BashOperator First I tried using the PodOperator but it doesnt allow to pass a spec file, it just builds from the image. I tried using the BashOperator since the worker…
3
votes
2 answers

How to set path to chromedriver in Google Cloud Composer

I am attempting to run a DAG that will use selenium to scrape a web page every week on Cloud Composer. I have already tried to give the path when creating the WebDriver.Chrome() instance to a driver that I uploaded to GCS, though I imagine this is…
3
votes
2 answers

Unable to Create Composer Environment On GCP

I created Composer Environment on GCP development environment with a service account that has the following permissions Composer Administrator Composer Worker Kubernetes Engine Admin Storage Object Admin BigQuery Admin Cloud SQL Admin I was able…
Midhun T
  • 111
  • 1
  • 9
3
votes
1 answer

how to set CPUS quota for Google Cloud Dataproc via Cloud Composer?

Trying out the Google Cloud Composer Quickstart in a free trial account, the example workflow DAG's first task runs this operator: create_dataproc_cluster = dataproc_operator.DataprocClusterCreateOperator( task_id='create_dataproc_cluster', …
Jerry101
  • 12,157
  • 5
  • 44
  • 63
3
votes
3 answers

how to pass query parameter to sql file using bigquery operator

I need access the parameter passed by BigqueryOperator in sql file, but I am getting error ERROR - queryParameters argument must have a type not I am using below code: t2 =…
ganesh_patil
  • 356
  • 3
  • 18
3
votes
0 answers

DummyOperator marked upstream_failed yet all upstream tasks marked success

I have an Airflow pipeline that produces 12 staging tables from Google Cloud Storage files and then performs some downstream processing. I have a DummyOperator to collect these tasks before proceeding to the next stages. I'm getting an error on the…
JohnB
  • 1,743
  • 5
  • 21
  • 40
3
votes
1 answer

Log link of failed Hive job submitted to Dataproc through Airflow

I have submitted a Hive job using Airflow's DataprocWorkflowTemplateInstantiateInlineOperator to Dataproc cluster. When some of the jobs fail in googlecloud->dataproc->jobs I can see a link to the log with failure: Google Cloud Dataproc Agent…
3
votes
1 answer

Unable to install a PyPi package in Cloud Composer

I tried installing boto3 on Composer but after some time I received the following error: Any ideas what's going on? I typed boto3 in the package name without specifying the version. Thanks
Leo
  • 900
  • 1
  • 11
  • 26
3
votes
1 answer

Cloud Composer / Airflow: Relationships can only be set between Operators; received PythonOperator

We have several Airflow DAGs in Cloud Composer that previously worked fine. The code for the DAGs and Operators has not been changed, but after a recent deployment, we now get this error from the DAGs: Broken DAG: [...] Relationships can only be…
MHG
  • 1,410
  • 3
  • 19
  • 32
3
votes
1 answer

Does Google Cloud Composer support connecting to AWS resource like S3 and Redshift?

I am planning to use Google Cloud Composer to schedule a workflow which is loading data from S3 to Redshift. As S3 and Redshift are all based on AWS, I want to know whether Google Cloud Composer allows me to do it. From my understanding after…
yyyyyyyyoung
  • 305
  • 1
  • 10