Questions tagged [google-cloud-composer]

Google Cloud Composer is a fully managed workflow orchestration service, built on Apache Airflow, that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.

Cloud Composer is a product of Google Cloud Platform (GCP). It is essentially "hosted/managed Apache Airflow."

The product allows you to create, schedule, and monitor jobs, each job being represented as a DAG (directed acyclic graph) of various operators. You can use Airflow's built-in operator definitions and/or or define your own in pure Python.

While technically you can do data processing directly within a task (instantiated operator), more often you will want a task to invoke some sort of processing in another system (which could be anything - a container, BigQuery, Spark, etc). Often you will then wait for that processing to complete using an Airflow sensor operator, possibly launch further dependent tasks, etc.

While Cloud Composer is managed, you can apply a variety of customizations, such as specifying which pip modules to install, hardware configurations, environment variables, etc. Cloud Composer allows overriding some but not all Airflow configuration settings.

Further technical details: Cloud Composer will create a Kubernetes cluster for each Airflow environment you create. This is where your tasks will be run, but you don't have to manage it. You will place your code within a specific bucket in Cloud Storage, and Cloud Composer will sync it from there.

1225 questions
0
votes
1 answer

Cloud composer parse json

I have imported a simple json file into the data folder. What's the best way to load, parse and use params in the json?
JY2k
  • 2,879
  • 1
  • 31
  • 60
0
votes
2 answers

Airflow start multiple concurrent generic tasks

Trying to get a few tasks concurrently on cloud composer: arr = {} for i in xrange(3): print("i: " + str(i)) command_formatted = command_template.format(str(i)) create_training_instance = bash_operator.BashOperator( …
JY2k
  • 2,879
  • 1
  • 31
  • 60
0
votes
0 answers

Order of task execution in DAG in Google Cloud Composer impacts whether task is executed

I'm new to Google Cloud Composer and have run into what seems to be a strange issue in the DAG that I've created. I have a process which takes a tar.gz file from cloud storage, rezips it as a .gz file and then loads the .gz file to BigQuery. …
Majobber
  • 35
  • 5
0
votes
2 answers

Airflow DataprocClusterCreateOperator

In Airflow DataprocClusterCreateOperator settings: Do we have a chance to set the Primary disk type for master and worker to pd-ssd? The default setting is standard. I was looking into the documentation - I don't find any parameters.
Balaji
  • 111
  • 4
0
votes
2 answers

Monitoring the Airflow web server when using Google Cloud Composer

How can I monitor the Airflow web server when using Google Cloud Composer? If the web server goes down or crashes due to an error, I would like to receive an alert.
0
votes
3 answers

cloud composer + cloud ml engine tutorial?

i couldnt find tutorial for cloud ML engine + airflow, someone please help deploy a cloud ml engine model and orchestrate with airflow to run training with new data every hour
AVR
  • 83
  • 9
0
votes
1 answer

Using Google Cloud Composer Rest API OR Node.js Client

Hi i am new to Google Cloud Platform and Cloud Composer. I want to create Cloud Composer Environment using code. But before that i have following questions: Is there any way to create Cloud Composer Environment using its Rest Api? Using Try Api…
Sarang Shinde
  • 717
  • 3
  • 7
  • 24
0
votes
0 answers

Can't do Mysql dump from google cloud composer

I'm trying to create a simple dag that dumps mysql to a bucket like this: extract = MySqlToGoogleCloudStorageOperator( task_id='extract_data', mysql_conn_id='mysql_instance_connection', …
Rob
  • 3,333
  • 5
  • 28
  • 71
0
votes
1 answer

Google Composer : dag_id could not be found

I created a collection of dags dynamically (using the same .py for all). And there is one build-DAG that I cannot run : airflow.exceptions.AirflowException: dag_id could not be found: `build-DAG`. Either the dag did not exist or it failed to…
edduuar
  • 21
  • 4
0
votes
1 answer

Cloud composer unstable UI

The Airflow UI randomly fails to show up and a 503 google error message is shown. It's getting really hard and annoying to navigate the Airflow UI. Is this a known issue? After searching this for a long time, on the internet I did not get any leads.…
Tameem
  • 408
  • 7
  • 19
0
votes
1 answer

PythonOperator task hangs accessing Cloud Storage and is stacked as SCHEDULED

One of the tasks in my DAG sometimes hangs when accessing Cloud Storage. It seems the code stops at the download function here: hook = GoogleCloudStorageHook(google_cloud_storage_conn_id='google_cloud_default') for input_file in hook.list(bucket,…
TAPeri
  • 13
  • 1
  • 6
0
votes
2 answers

"This DAG seems to be existing only locally. The master scheduler doesn't seem to be aware of its existence."

I start experimenting with Google Cloud Composer where I deploy few DAGs: One of my DAG with an info statement indicating This DAG seems to be existing only locally. The master scheduler doesn't seem to be aware of its existence. cannot run, even…
Thibault Clement
  • 2,360
  • 2
  • 13
  • 17
0
votes
1 answer

Connect external workers to Cloud Composer airflow

Is it possible to connect an external worker that is not part of the Cloud Composer Kubernetes cluster? Use case would be connecting a box in a non-cloud data center to a Composer cluster.
David Adrian
  • 1,079
  • 2
  • 9
  • 24
0
votes
1 answer

Access bucket from another project?

I have a script in a VM that write data in a bucket in another project. I want to schedule this script with Airflow but I have IAM access problem when the script need to write data: AccessDeniedException: 403…
MohamedLEGH
  • 309
  • 1
  • 11
0
votes
1 answer

Does Cloud Composer have failover?

I've read the Cloud Composer overview (https://cloud.google.com/composer/) and documentation (https://cloud.google.com/composer/docs/). It doesn't seem to mention failover. I'm guessing it does, since it runs on Kubernetes cluster. Does it? By…
cryanbhu
  • 4,780
  • 6
  • 29
  • 47