Questions tagged [google-cloud-composer]

Google Cloud Composer is a fully managed workflow orchestration service, built on Apache Airflow, that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.

Cloud Composer is a product of Google Cloud Platform (GCP). It is essentially "hosted/managed Apache Airflow."

The product allows you to create, schedule, and monitor jobs, each job being represented as a DAG (directed acyclic graph) of various operators. You can use Airflow's built-in operator definitions and/or or define your own in pure Python.

While technically you can do data processing directly within a task (instantiated operator), more often you will want a task to invoke some sort of processing in another system (which could be anything - a container, BigQuery, Spark, etc). Often you will then wait for that processing to complete using an Airflow sensor operator, possibly launch further dependent tasks, etc.

While Cloud Composer is managed, you can apply a variety of customizations, such as specifying which pip modules to install, hardware configurations, environment variables, etc. Cloud Composer allows overriding some but not all Airflow configuration settings.

Further technical details: Cloud Composer will create a Kubernetes cluster for each Airflow environment you create. This is where your tasks will be run, but you don't have to manage it. You will place your code within a specific bucket in Cloud Storage, and Cloud Composer will sync it from there.

1225 questions
4
votes
1 answer

How we can use GCSToSFTPOperator in GCP composer enviornment?

I want to use GCSToSFTPOperator in my GCP composer environment, we have ariflow version 1.10.3, composer-1.8.3-airflow-1.10.3(I have upgared version from 1.10.2 to 1.10.3) in GCP composer environment. GCSToSFTPOperator is present in latest release…
Bhagesh Arora
  • 547
  • 2
  • 12
  • 30
4
votes
0 answers

Airflow: Google Cloud Composer: TypeError: Object of type 'bytes' is not JSON serializable

I'm using MySqlToGoogleCloudStorageOperator as below. Dag Task Details def gen_export_table_task(table_config): export_task = MySqlToGoogleCloudStorageOperator(task_id='export_dag_run_to_gcs', …
DataVishesh
  • 197
  • 1
  • 5
4
votes
1 answer

Workflow scheduling on GCP Dataproc cluster

I have some complex Oozie workflows to migrate from on-prem Hadoop to GCP Dataproc. Workflows consist of shell-scripts, Python scripts, Spark-Scala jobs, Sqoop jobs etc. I have come across some potential solutions incorporating my workflow…
4
votes
1 answer

Why 2 Pub/Sub topics & subscription gets created automatically while creating Cloud Composer environment

I have noticed that 2 Pub/Sub topics & subscriptions getting created automatically while creating cloud composer environment , so what is the need of pub/sub here, how the internal architecture of composer is related with Pub/Sub. I need this…
4
votes
3 answers

Connecting CloudSQL Postgresql from Cloud Composer

I have Google Cloud Project with VPN enabled connectivity and Google Cloud SQL (PostgreSQL) database instance with the same VPN connectivity along with SSL enabled. Cloud SQL has both Public and Private IP addresses. Public IP I used for connecting…
lourdu rajan
  • 329
  • 1
  • 5
  • 24
4
votes
1 answer

Long-running Airflow task gets incorrectly marked as failed due to hostname mismatch

I have a long-running Cloud Composer Airflow task that kicks off a job using the KubernetesPodOperator. Sometimes it finishes successfully after about two hours, but more often it gets marked as failed with the following error in the Airflow worker…
4
votes
4 answers

Template_searchpath gives TemplateNotFound error in Airflow and cannot find the SQL script

I have a DAG described like this : tmpl_search_path = '/home/airflow/gcs/sql_requests/' with DAG(dag_id='pipeline', default_args=default_args, template_searchpath = [tmpl_search_path]) as dag: create_table =…
4
votes
2 answers

How to set proper permissions to run KubernetesPodOperator in Cloud Composer?

I am trying to run a simple KubernetesPodOperator in my Composer environment as per the documentation here. The airflow runtime is failing due to lack of permission for the user "default". That said, how to properly create an environment or to set…
Alan Borsato
  • 248
  • 2
  • 13
4
votes
1 answer

Accessing Kubernetes Secret from Airflow KubernetesPodOperator

I'm setting up an Airflow environment on Google Cloud Composer for testing. I've added some secrets to my namespace, and they show up fine: $ kubectl describe secrets/eric-env-vars Name: eric-env-vars Namespace: eric-dev Labels: …
Eric Fulmer
  • 706
  • 2
  • 6
  • 23
4
votes
1 answer

How to pass dynamic arguments Airflow operator?

I am using Airflow to run Spark jobs on Google Cloud Composer. I need to Create cluster (YAML parameters supplied by user) list of spark jobs (job params also supplied by per job YAML) With the Airflow API - I can read YAML files, and push…
4
votes
1 answer

Unable to publish Pubsub message in Airflow Python 3

I am unable to publish using the PubSubHook in Airflow with Python 3. Everything works perfectly with Python 2, but with Python 3 I get this error {models.py:1760} ERROR - Object of type 'bytes' is not JSON serializable. It seems that encoding the…
4
votes
1 answer

How to pass Spark job properties to DataProcSparkOperator in Airflow?

I am trying to execute Spark jar on Dataproc using Airflow's DataProcSparkOperator. The jar is located on GCS, and I am creating Dataproc cluster on the fly and then executing this jar on the newly created Dataproc cluster. I am able to execute this…
4
votes
1 answer

GCP Composer (Airflow) operator

I'm using the GCP Composer API (Airflow) and my DAG to scale up the number of workers keep returning me the error below: Broken DAG: [/home/airflow/gcs/dags/cluster_scale_workers.py] 'module' object has no attribute 'DataProcClusterScaleOperator'…
4
votes
3 answers

Google Cloud Composer variables do not propagate to Airflow

I am trying to run the example in Google Cloud Composer documentation on and I find issues, mainly two: the environment variables, when created via the gcloud command line or the web interface, do not propagate to the Airflow layer, making that the…
Picarus
  • 760
  • 1
  • 10
  • 25
4
votes
1 answer

Internal server error in Google Composer web UI [Error code 28]

We are using Google Composer for workflow orchestration, randomly we are getting An internal server error occurred while authorizing your request. Error code 28 message while opening the web UI. We don't know the cause for this issue. How to fix…
SANN3
  • 9,459
  • 6
  • 61
  • 97