Questions tagged [google-cloud-composer]

Google Cloud Composer is a fully managed workflow orchestration service, built on Apache Airflow, that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.

Cloud Composer is a product of Google Cloud Platform (GCP). It is essentially "hosted/managed Apache Airflow."

The product allows you to create, schedule, and monitor jobs, each job being represented as a DAG (directed acyclic graph) of various operators. You can use Airflow's built-in operator definitions and/or or define your own in pure Python.

While technically you can do data processing directly within a task (instantiated operator), more often you will want a task to invoke some sort of processing in another system (which could be anything - a container, BigQuery, Spark, etc). Often you will then wait for that processing to complete using an Airflow sensor operator, possibly launch further dependent tasks, etc.

While Cloud Composer is managed, you can apply a variety of customizations, such as specifying which pip modules to install, hardware configurations, environment variables, etc. Cloud Composer allows overriding some but not all Airflow configuration settings.

Further technical details: Cloud Composer will create a Kubernetes cluster for each Airflow environment you create. This is where your tasks will be run, but you don't have to manage it. You will place your code within a specific bucket in Cloud Storage, and Cloud Composer will sync it from there.

1225 questions
8
votes
1 answer

Cloud composer tasks fail without reason or logs

I run Airflow in a managed Cloud-composer environment (version 1.9.0), whic runs on a Kubernetes 1.10.9-gke.5 cluster. All my DAGs run daily at 3:00 AM or 4:00 AM. But sometime in the morning, I see a few Tasks failed without a reason during the…
Ary Jazz
  • 1,576
  • 1
  • 16
  • 25
8
votes
5 answers

How to restart webserver in Cloud Composer

We recently met a known issue on airflow: Airflow "This DAG isnt available in the webserver DagBag object " Now we used a temporary solution to restart whole environment by changing configurations but this is not an efficient method. The best…
8
votes
3 answers

How can I restart the airflow server on Google Composer?

When I need to restart the webserver locally I do: ps -ef | grep airflow | awk '{print $2}' | xargs kill -9 airflow webserver -p 8080 -D How can I do this on Google Composer? I don't see an option to restart the server in the console.
8
votes
1 answer

Running docker operator from Google Cloud Composer

As for the documentation, Google Cloud Composer airflow worker nodes are served from a dedicated kubernetes cluster: I have a Docker contained ETL step that I would like to run using airflow, preferably on the same Kubernetes that is hosting the…
Maxim Veksler
  • 29,272
  • 38
  • 131
  • 151
8
votes
3 answers

Using Airflow template files and template_searchpath in Google Cloud Composer

I'm using the BigQueryOperator extenstively in my Airflow DAGs on Google Cloud Composer. For longer queries, it's better to put each query in its own .sql file rather than cluttering up the DAG with it. Airflow seems to support this for all SQL…
conradlee
  • 12,985
  • 17
  • 57
  • 93
8
votes
2 answers

How do I select my Airflow or Python version with Cloud Composer?

I am using Cloud Composer and I noticed that it selects the version of Apache Airflow and Python (2.7.x) for me. I want to use a different version of Airflow and/or Python. How can I change this?
James
  • 2,321
  • 14
  • 30
7
votes
1 answer

How do I install the same pip dependencies locally as are installed in my Cloud Composer Airflow environment on GCP?

I'm trying to set up a local development environment in VS Code where I'd get code completion for the packages Cloud Composer/Apache Airflow uses. I've been successful so far using a virtual environment (created with python -m venv .venv) and a very…
Matt Welke
  • 1,441
  • 1
  • 15
  • 40
7
votes
1 answer

Airflow triggering the "on_failure_callback" when the "dagrun_timeout" is exceeded

Currently working on setting up alerts for long running tasks in Airflow. To cancel/fail the airflow dag I've put "dagrun_timeout" in the default_args, and it does what I need, fails/errors the dag when its been running for too long (usually stuck).…
Anton
  • 581
  • 1
  • 5
  • 23
7
votes
1 answer

Why is there a DAG named 'airflow_monitoring' automatically generated in Cloud Composer?

When creating an Airflow environment on GCP Composer, there is a DAG named airflow_monitoring automatically created and that comes back even when deleted. Why? How to handle it? Should I copy this file inside my DAG folder and resign myself to make…
7
votes
1 answer

Airflow error importing DAG using plugin - Relationships can only be set between Operators

I have written an airflow plugin that simply contains one custom operator (to support CMEK in BigQuery). I can create a simple DAG with a single task that uses this operator and that executes fine. However if I try and create a dependency in the…
Jasper Smith
  • 95
  • 1
  • 6
7
votes
1 answer

how to create custom operators in airflow and use them in airflow template which is running through cloud composer(in google cloud platform)

I need to create a custom airflow operator which I should be able to use in airflow template(Written in python) which is running in cloud composer... If I create custom airflow operator how can I use it in a template which is running on the…
7
votes
2 answers

Cloud Composer (Airflow) jobs stuck

My Cloud Composer managed Airflow got stuck for hours since I've canceled a Task Instance that was taking too long (Let's call it Task A) I've cleared all the DAG Runs and task instances, but there are a few jobs running and one job with Shutdown…
Ary Jazz
  • 1,576
  • 1
  • 16
  • 25
7
votes
6 answers

Where to save service account key file for Google Cloud Composer connection setup?

I am trying to setup a Google Cloud Platform connection in Google Cloud Composer using the service account key. So I created a GCS bucket and put the service account key file in the bucket. The key is stored in JSON. In the keyfile path field I…
7
votes
4 answers

Composer auto scaling?

Given that GCP Cloud Composer is running with GKE/GCE, is it auto scaling? Now I have 3 nodes in the cluster to support say, 100 DAGs. Later, if I have about 300 DAGs, will it scale up itself (w/ celery workers)?
6
votes
3 answers

how to import custom modules in Cloud Composer

I created a local project with apache Airflow and i want to run it in cloud composer. My project contains custom modules and a main file that calls them. Example : from src.kuzzle import KuzzleQuery Structure: main.py src kuzzle.py I have…
1
2
3
81 82