Questions tagged [google-cloud-composer]

Google Cloud Composer is a fully managed workflow orchestration service, built on Apache Airflow, that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.

Cloud Composer is a product of Google Cloud Platform (GCP). It is essentially "hosted/managed Apache Airflow."

The product allows you to create, schedule, and monitor jobs, each job being represented as a DAG (directed acyclic graph) of various operators. You can use Airflow's built-in operator definitions and/or or define your own in pure Python.

While technically you can do data processing directly within a task (instantiated operator), more often you will want a task to invoke some sort of processing in another system (which could be anything - a container, BigQuery, Spark, etc). Often you will then wait for that processing to complete using an Airflow sensor operator, possibly launch further dependent tasks, etc.

While Cloud Composer is managed, you can apply a variety of customizations, such as specifying which pip modules to install, hardware configurations, environment variables, etc. Cloud Composer allows overriding some but not all Airflow configuration settings.

Further technical details: Cloud Composer will create a Kubernetes cluster for each Airflow environment you create. This is where your tasks will be run, but you don't have to manage it. You will place your code within a specific bucket in Cloud Storage, and Cloud Composer will sync it from there.

1225 questions
6
votes
1 answer

Airflow on Cloud Composer Cannot import module

I'm running a DAG test_dag.py which is structured in the following way in my Google Cloud Storage Bucket. gcs-bucket/ dags/ test_dag.py dependencies/ __init__.py dependency_1.py module1/ …
Matt
  • 1,368
  • 1
  • 26
  • 54
6
votes
3 answers

FERNET_KEY configuration is missing when creating a new environment with the same DAGS

I'm using Composer (Airflow) in Google Cloud. I want to create a new environment and take my same DAGs and Variables from the old environment into the new one. To accomplish this I do the following: I check several of my variables and export them…
bpgeck
  • 1,592
  • 1
  • 14
  • 32
6
votes
1 answer

Airflow: Duplicate entry mysql integrity error when triggering a DAG run

I have two Airflow DAGs - scheduler and worker. Scheduler runs every minute and polls for new aggregation jobs and triggers worker jobs. You can find the code for scheduler job below. However out of over 6000 scheduler job runs 30 failed with the…
6
votes
1 answer

Airflow task retried after failure despite retries=0

I have an Airflow environment running on Cloud Composer (3 n1-standard-1 nodes; image version: composer-1.4.0-airflow-1.10.0; config override: core catchup_by_default=False; PyPI packages: kubernetes==8.0.1). During a DAG run, a few tasks (all…
D Cohen
  • 157
  • 1
  • 7
6
votes
1 answer

How can I obtain suitable credentials in a cloud composer environment to make calls to the google sheets API?

I would like to be able to access data on a google sheet when running python code via cloud composer; this is something I know how to do in several ways when running code locally, but moving to the cloud is proving challenging. In particular I wish…
6
votes
1 answer

How to configure Google Cloud Composer cost-effectively

After some research and testing, we have decided to start using Google Cloud Composer. Since our current DAGs and tasks are relatively small, and don't require the server to run continuously, I am looking how to manage costs. Two questions: The…
dkapitan
  • 859
  • 2
  • 10
  • 21
6
votes
1 answer

Is it possible to install github repository on Google Cloud Composer

As title, we can set pypi packages in requirements.txt file and use the command gcloud beta composer environments update env_name --update-pypi-packages-from-file requirements.txt --location location to update the cloud composer environment. But…
5
votes
2 answers

GCP Composer v1.18.6 and 2.0.10 incompatible with CloudSqlProxyRunner

In my Composer Airflow DAGs, I have been using the CloudSqlProxyRunner to connect to my Cloud SQL instance. However, after updating Google Cloud Composer from v1.18.4 to 1.18.6, my DAG started to encounter a strange error: [2022-04-22, 23:20:18 UTC]…
5
votes
1 answer

Component Gateway activation on dataproc does not work with composer(airflow) operator airflow.providers.google.cloud.operators.dataproc

I’m trying execute this dag bellow. It seems that the operator creating a dataproc cluster does not enable enabling the optional components to enable jupyter notebook and anaconda. I found this code here: Component Gateway with DataprocOperator on…
5
votes
2 answers

Kubernetespodoperator how to use cmds or cmds and arguments to run multiple commands

I'm using GCP composer to run an algorithm and at the end of the stream I want to run a task that will perform several operations copying and deleting files and folders from a volume to a bucket I'm trying to perform these copying and deleting…
Amit Lipman
  • 687
  • 7
  • 19
5
votes
1 answer

Workload Identity & Service Accounts for Composer 2 / GKE Autopilot Cluster PodOperator tasks

I'm trying to run GKEStartPodOperator/KubernetesPodOperator tasks in a Composer 2 environment, which makes use of a GKE cluster in autopilot mode. We have an existing Composer 1 environment with a GKE cluster not in autopilot mode. Our tasks that…
5
votes
0 answers

Airflow: `'TaskInstance' object has no attribute 'task'` after retrieving task instances via DagRun.get_task_instances()

I'm running composer-1.16.6-airflow-1.10.15. For a daily scheduled DAG, I want to write a custom on_failure_notification that only sends a notification if a task instance has failed for multiple days sequentially. My plan is to get the failed task…
tsabsch
  • 2,131
  • 1
  • 20
  • 28
5
votes
1 answer

Run a shell script file with Airflow on Google Cloud Composer

I have several multi-purpose shell scripts stored in .sh files. My intention is to build a few Airflow DAGs on Cloud Composer that will leverage these scripts. The DAGs would be made mostly of BashOperators that call the scripts with specific…
Vlad Gheorghe
  • 470
  • 2
  • 17
5
votes
5 answers

Google Cloud Composer (Apache Airflow) cannot access log files

I'm running a DAG in Google Cloud Composer (hosted Airflow) which runs fine in Airflow locally. All it does is print "Hello World". However, when I run it through Cloud Composer I receive the error: *** Log file does not exist:…
Matt
  • 1,368
  • 1
  • 26
  • 54
5
votes
3 answers

What is the difference between GCP Kubeflow and GCP cloud composer?

I am learning GCP, and came across Kuberflow and Google Cloud Composer. From what I have understood, it seems that both are used to orchestrate workflows, empowering the user to schedule and monitor pipelines in the GCP. The only difference that I…
Nizam
  • 340
  • 1
  • 6
  • 11
1 2
3
81 82