Questions tagged [google-cloud-composer]

Google Cloud Composer is a fully managed workflow orchestration service, built on Apache Airflow, that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.

Cloud Composer is a product of Google Cloud Platform (GCP). It is essentially "hosted/managed Apache Airflow."

The product allows you to create, schedule, and monitor jobs, each job being represented as a DAG (directed acyclic graph) of various operators. You can use Airflow's built-in operator definitions and/or or define your own in pure Python.

While technically you can do data processing directly within a task (instantiated operator), more often you will want a task to invoke some sort of processing in another system (which could be anything - a container, BigQuery, Spark, etc). Often you will then wait for that processing to complete using an Airflow sensor operator, possibly launch further dependent tasks, etc.

While Cloud Composer is managed, you can apply a variety of customizations, such as specifying which pip modules to install, hardware configurations, environment variables, etc. Cloud Composer allows overriding some but not all Airflow configuration settings.

Further technical details: Cloud Composer will create a Kubernetes cluster for each Airflow environment you create. This is where your tasks will be run, but you don't have to manage it. You will place your code within a specific bucket in Cloud Storage, and Cloud Composer will sync it from there.

1225 questions
3
votes
1 answer

Airflow DAG - how to check BQ first (delete if necessary) and then run dataflow job?

I am using cloud composer to orchestrate ETL for files arriving in GCS going to BigQuery. I have a cloud function that triggers the dag when a file arrives and the cloud function passes the file name/location to the DAG. In my DAG I have 2 tasks:…
3
votes
1 answer

Cloud Composer missing variables file

I've been trying to import a JSON file of environment variables to a newly created Cloud Composer instance using the airflow CLI but when running the below I get the error: Missing variables file. gcloud composer environments run ${COMPOSER_NAME} \ …
Josh Laird
  • 6,974
  • 7
  • 38
  • 69
3
votes
1 answer

Airflow tasks get stuck at “Scheduled” status and never gets running during backfill

Trying to do some backfills, and all the dag runs start up fine, but for some reason they can't get by a specific task, instead they get stuck in a "Scheduled" state. Not sure what "Scheduled" means and why they don't move to "Running". It works…
Tomas Jansson
  • 22,767
  • 13
  • 83
  • 137
3
votes
1 answer

Airflow DAGs not running on Google Cloud Composer: "Dependencies Blocking Task From Getting Scheduled"

I've just set up a Cloud Composer Environment on Python 3 and Composer image version composer-1.4.0-airflow-1.10.0. All settings are otherwise "stock"; i.e. no configuration overrides. I'm trying to test out an extremely simple DAG. It runs without…
3
votes
2 answers

How to run Airflow DAG for specific number of times?

How to run airflow dag for specified number of times? I tried using TriggerDagRunOperator, This operators works for me. In callable function we can check states and decide to continue or not. However the current count and states needs to be…
Omkara
  • 414
  • 4
  • 16
3
votes
2 answers

Does Cloud Composer require a /14 custom subnet?

I need my Composer environment to reach some on-prem resources over the VPN tunnel established between GCP and my network. I have my custom VPC network setup and running with a series of /20 subnets. The problem is I can't spin up a new Composer…
larruda
  • 105
  • 1
  • 5
3
votes
3 answers

Connect to Cloud SQL for PosgreSQL from Cloud Composer

My question is about configure Google Cloud Composer to reach Google Cloud SQL using the same network configuration in the same Google Cloud project. Cloud SQL configured with Private IP associated to a Default Network. Cloud SQL config Cloud…
3
votes
3 answers

Why shouldn't you run Kubernetes pods for longer than an hour from Composer?

The Cloud Composer documentation explicitly states that: Due to an issue with the Kubernetes Python client library, your Kubernetes pods should be designed to take no more than an hour to run. However, it doesn't provide any more context than…
3
votes
2 answers

Airflow/Composer recommended folder structure

Do you guys have any recommended for Composer folder/directories structure? The way it should be structured is different from the way our internal Airflow server is using right now. Based on Google documentation:…
Tuan Vu
  • 708
  • 7
  • 15
3
votes
1 answer

Cloud Composer GKE Node upgrade results in Airflow task randomly failing

The problem: I have a managed Cloud composer environment, under a 1.9.7-gke.6 Kubernetes cluster master. I tried to upgrade it (as well as the default-pool nodes) to 1.10.7-gke.1, since an upgrade was available. Since then, Airflow has been acting…
3
votes
1 answer

Passing typesafe config conf files to DataProcSparkOperator

I am using Google dataproc to submit spark jobs and google cloud composer to schedule them. Unfortunately, I am facing difficulties. I am relying on .conf files (typesafe config files) to pass arguments to my spark jobs. I am using the following…
3
votes
3 answers

Airflow/Composer - template not found in zip packaged DAG

I'm having trouble getting a templated SQL file to work in Composer. I think the problem is related to the fact that I'm packaging the DAG as a zip file in order to include additional code. I started with this (just showing relevant parts): dag =…
pteehan
  • 807
  • 9
  • 19
3
votes
2 answers

Airflow, mark a task success or skip it before dag run

We have a huge DAG, with many small and fast tasks and a few big and time consuming tasks. We want to run just a part of the DAG, and the easiest way that we found is to not add the task that we don't want to run. The problem is that our DAG has…
Pablo
  • 3,135
  • 4
  • 27
  • 43
3
votes
1 answer

Run java Google Dataflow job from Cloud Composer

at the moment we're using an Airflow version installed by ourselves on Kubernetes but the idea is to migrate on Cloud Composer. We're using Airflow to run dataflow jobs using a customized version of DataFlowJavaOperator (using a plugin) because we…
stesua
  • 31
  • 3
3
votes
1 answer

Can't spin up Composer env in Australian zone: "INVALID_ARGUMENT: Unexpected location: australia-southeast1"

I'm trying to create my Composer env in the Australian region. But I keep getting the following error: "INVALID_ARGUMENT: Unexpected location: australia-southeast1" Using the following command: gcloud beta composer environments create…
Graham Polley
  • 14,393
  • 4
  • 44
  • 80