Questions tagged [google-cloud-composer]

Google Cloud Composer is a fully managed workflow orchestration service, built on Apache Airflow, that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.

Cloud Composer is a product of Google Cloud Platform (GCP). It is essentially "hosted/managed Apache Airflow."

The product allows you to create, schedule, and monitor jobs, each job being represented as a DAG (directed acyclic graph) of various operators. You can use Airflow's built-in operator definitions and/or or define your own in pure Python.

While technically you can do data processing directly within a task (instantiated operator), more often you will want a task to invoke some sort of processing in another system (which could be anything - a container, BigQuery, Spark, etc). Often you will then wait for that processing to complete using an Airflow sensor operator, possibly launch further dependent tasks, etc.

While Cloud Composer is managed, you can apply a variety of customizations, such as specifying which pip modules to install, hardware configurations, environment variables, etc. Cloud Composer allows overriding some but not all Airflow configuration settings.

Further technical details: Cloud Composer will create a Kubernetes cluster for each Airflow environment you create. This is where your tasks will be run, but you don't have to manage it. You will place your code within a specific bucket in Cloud Storage, and Cloud Composer will sync it from there.

1225 questions
-1
votes
1 answer

Cloud composer import custom plugin to all existing dags

I am using Cloud Composer to schedule multiple DAGs. These DAGs are built dynamically using this method and they use custom plugins. I would like to know how to proceed when adding / modifying a plugin which concerns all DAGs (let's say it adds a…
Ferdi777
  • 317
  • 4
  • 14
-1
votes
1 answer

Broken DAG: [/home/airflow/gcs/dags/cdp/xox/cdp_box_audit.py] No module named 'dags.box'

I recently moved to google composer from airflow server,what is the recommended folder structure for dags, sub dags, config files ? Error message, getting Broken DAG: [/home/airflow/gcs/dags/cdp/box/cdp_box_audit.py] No module named…
-1
votes
1 answer

Location of /home/airflow

I have specified 3 nodes when creating a cloud composer environment. I tried to connect to worker nodes via SSH but I am not able to find airflow directory in /home. So where exactly is it located?
-1
votes
1 answer

How to get end time of previous job

I have a task which is scheduled every few minutes. I want to implement the logic where new task starts where previous successfully executed task left off. More concretely I use this time intervals to than query the database and so I don't to miss…
marknorkin
  • 3,904
  • 10
  • 46
  • 82
-1
votes
1 answer

cloud composer spark submit existing Hadoop cluster

I'm trying to use cloud composer at few days But, I has a mission that spark submit to our existing Hadoop cluster with yarn mode Is it possible using by cloud composer?
TJune
  • 1
-2
votes
1 answer

How can we set the env variable of pre-existing composer env using terraform

I am trying to create a composer environment using terraform. I want to use the same bucket name created by the env, in my DAG for which I am trying to set it as env variable. Since the bucket name is known after the composer env is created, how can…
-2
votes
1 answer

which GCP component to use to fetch data from an API

I'm a little bit confused between gcp components, here is my use case : daily, I need to fetch data from an external API (the API return json data), store it in GCS then load it in Bigquery, I already created the python script fetching the data and…
-2
votes
1 answer

Unpausing and Pausing a dag in python

Scenario: I make a POST call that triggers a process to export a file, this call returns an Export-ID. This process can take an unknown amount of time to complete, so I have to make a GET call using the Export-ID periodically to see if the process…
-3
votes
1 answer

Google Cloud Composer and Google Cloud SQL Proxy

I have a project with Cloud Composer and Cloud SQL. I am able to connect to Cloud SQL because i edited the yaml of airflow-sqlproxy-service and added my Cloud SQL instance on cloud proxy used for the airflow-db, mapping to port 3307. The workers can…
1 2 3
81
82