Questions tagged [google-cloud-composer]

Google Cloud Composer is a fully managed workflow orchestration service, built on Apache Airflow, that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.

Cloud Composer is a product of Google Cloud Platform (GCP). It is essentially "hosted/managed Apache Airflow."

The product allows you to create, schedule, and monitor jobs, each job being represented as a DAG (directed acyclic graph) of various operators. You can use Airflow's built-in operator definitions and/or or define your own in pure Python.

While technically you can do data processing directly within a task (instantiated operator), more often you will want a task to invoke some sort of processing in another system (which could be anything - a container, BigQuery, Spark, etc). Often you will then wait for that processing to complete using an Airflow sensor operator, possibly launch further dependent tasks, etc.

While Cloud Composer is managed, you can apply a variety of customizations, such as specifying which pip modules to install, hardware configurations, environment variables, etc. Cloud Composer allows overriding some but not all Airflow configuration settings.

Further technical details: Cloud Composer will create a Kubernetes cluster for each Airflow environment you create. This is where your tasks will be run, but you don't have to manage it. You will place your code within a specific bucket in Cloud Storage, and Cloud Composer will sync it from there.

1225 questions
3
votes
1 answer

ModuleNotFoundError: No module named 'airflow'

I'm using the Airflow PythonOperator to execute a python Beam job using the Dataflow runner. The Dataflow job returns the error "ModuleNotFoundError: No module named 'airflow'" In the DataFlow UI the SDK version being used when the job is called…
3
votes
2 answers

How can I pass the yes flag to backfill in Google Composer?

I am trying to do a backfill in Google Composer using a gcloud composer command but I am struggling to pass a -y or --yes for a corresponding --reset_dagruns argument. The response I get is airflow: error: unrecognized arguments: -y. Command: gcloud…
Brice
  • 346
  • 4
  • 14
3
votes
2 answers

Airflow scheduler does not schedule (or slowly) when lot of tasks

I am working with airflow on Google Cloud Composer (version: composer-1.10.2-airflow-1.10.6). I realized that my that the scheduler doesn't schedule task when there is a lot of tasks to process (See Gantt view below) (don't pay attention to the…
Bibimrlt
  • 158
  • 1
  • 2
  • 10
3
votes
1 answer

How do I identify which DAG / task is causing Node memory issues?

Tasks are regularly failing in our DAGs, and after following Google's troubleshooting steps I've identified the underlying cause to be memory evictions due to insufficient memory. This matches what I'm seeing in the Memory utilization per node graph…
3
votes
2 answers

Unable to access AWS s3 bucket from Private Google cloud composer

I am using GCP cloud composer ( Airflow ) to sync s3 files with GCS bucket. When I am setting up public composer (public cluster), I am able to run command “gsutil ls s3://bucket_name" and this list out files in it but when i setting up private…
3
votes
4 answers

Sync directory files to Google Cloud Composer dags/ folder

I'd like to sync the contents of a folder in my repository to the GCP Composer dags/ folder in a simple command. The gcloud composer cli seems to have a command for this however it leaves a warning that support for wildcards is being removed. >>…
gavinest
  • 308
  • 2
  • 12
3
votes
1 answer

How to trigger a google composer DAG on a pub/sub publish message?

Google cloud function cannot be used to trigger the composer DAG on Pub/Sub message I have tried the PubSubPullSensor pull_messages = PubSubPullSensor( task_id="pull_messages", ack_messages=True, project='xxxx', …
3
votes
1 answer

How to export large data from Postgres to S3 using Cloud composer?

I have been using the Postgres to S3 operator to load data from Postgres to S3. But recently, I had to export a very large table and my Airflow composer fails without any logs, this could be because we are using the NamedTemporaryFile function of…
Minato
  • 452
  • 5
  • 19
3
votes
2 answers

How do you schedule GCP AI Platform notebooks via Google Cloud Composer?

I've been tasked with automating the scheduling of some notebooks that are run daily that are on AI Platform notebooks via the Papermill operator, but actually doing this through Cloud Composer is giving me some troubles. Any help is appreciated!
3
votes
1 answer

Google Dataflow: Import custom Python module

I try to run a Apache Beam pipeline (Python) within Google Cloud Dataflow, triggered by a DAG in Google Cloud Coomposer. The structure of my dags folder in the respective GCS bucket is as follows: /dags/ dataflow.py <- DAG dataflow/ …
3
votes
2 answers

Google Cloud Composer vCPU time Confusion

I've been trying Composer recently to run my pipeline, and found it cost surprisingly high than I thought, here is what I got from the bill: Cloud Composer Cloud Composer vCPU time in South Carolina: 148.749 hours [Currency conversion: USD to AUD…
IanJay
  • 373
  • 4
  • 14
3
votes
1 answer

Best practices for multiple clients in apache Apache Airflow/Cloud Composer?

Problem: several tasks/jobs that need to be executed for a client a lot of clients (hundreds) tasks/jobs are nearly identical, only config changes Are there any best practices in Airflow to keep things simple? I'm thinking about (in no specific…
Jonny5
  • 1,390
  • 1
  • 15
  • 41
3
votes
1 answer

Cloud Composer Airflow webserver 502 Server Error after "successfully" updating with PyPi packages

I receive this error soon after upating Cloud Composer with PyPi packages - occurs consistently across the 4 configurations outlined below python packages added to Cloud Composer forex_python>=1.5.0 datalab>=1.1.5 Airflow webserver error 502 Server…
Skippy
  • 31
  • 1
  • 4
3
votes
1 answer

How to use Google Data Prep API using Python

Google Just launched the new API. Link is here. I want to know what is the host in this case as they are using example.com and using the port 3005. I am also following this article. But this does not provide example code.
3
votes
2 answers

Trigger Cloud Composer DAG with a Pub/Sub message

I am trying to create a Cloud Composer DAG to be triggered via a Pub/Sub message. There is the following example from Google which triggers a DAG every time a change occurs in a Cloud Storage…