Questions tagged [google-cloud-composer]

Google Cloud Composer is a fully managed workflow orchestration service, built on Apache Airflow, that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.

Cloud Composer is a product of Google Cloud Platform (GCP). It is essentially "hosted/managed Apache Airflow."

The product allows you to create, schedule, and monitor jobs, each job being represented as a DAG (directed acyclic graph) of various operators. You can use Airflow's built-in operator definitions and/or or define your own in pure Python.

While technically you can do data processing directly within a task (instantiated operator), more often you will want a task to invoke some sort of processing in another system (which could be anything - a container, BigQuery, Spark, etc). Often you will then wait for that processing to complete using an Airflow sensor operator, possibly launch further dependent tasks, etc.

While Cloud Composer is managed, you can apply a variety of customizations, such as specifying which pip modules to install, hardware configurations, environment variables, etc. Cloud Composer allows overriding some but not all Airflow configuration settings.

Further technical details: Cloud Composer will create a Kubernetes cluster for each Airflow environment you create. This is where your tasks will be run, but you don't have to manage it. You will place your code within a specific bucket in Cloud Storage, and Cloud Composer will sync it from there.

1225 questions
4
votes
3 answers

Can I restart a Cloud Composer Environment?

I'm using Google Cloud composer for a few days now mainly to move data from MySQL to BigQuery and it was working fine. At some point, it stopped working: Running tasks run for a very long time and then fail Tasks don't start New dags have the…
Ary Jazz
  • 1,576
  • 1
  • 16
  • 25
4
votes
1 answer

cloud composer build logs, where are they?

I am trying to install pypi dependencies on cloud composer following this guide. The build failed, and the error message says: name: "operations/14021472-6dbe-42b3-8ec1-ba7ac62be60e" done: true sequence_number: 1 error { code: 0 message: "The…
stackoverflower
  • 3,885
  • 10
  • 47
  • 71
4
votes
2 answers

How to apt-get install packages on kubernetes for Google cloud composer

I'm using Cloud composer to orchestrate my airflow instance but not sure how to install packages for the airflow worker bash. Previously I was running airflow on a google compute engine instance using docker, it was easy to specify requirements via…
D_usv
  • 433
  • 7
  • 21
4
votes
1 answer

How do I change the log level for airflow/composer

The info logging on airflow is way too much for my taste, so how do I change the log level?
Tomas Jansson
  • 22,767
  • 13
  • 83
  • 137
3
votes
1 answer

DataprocInstantiateInlineWorkflowTemplateOperator : Error in pysparkJob template

Hello fellow Stackoverflowers, I have been trying to use the DataprocInstantiateInlineWorkflowTemplateOperator to run a pyspark job. Sadly after following all the documentation I am getting error in Composer ValueError: Protocol message OrderedJob…
3
votes
1 answer

BigQuery running in Airflow is running query in incorrect region even though different region is explicitly specified

I'm trying to set a DEFAULT_TABLE_EXPIRATION_DAYS=7 on BigQuery schemas/datasets using a FOR loop. All my datasets are location in multi-region EU and I'm getting a list of datasets from the information schema. Here's the query I planned to use for…
3
votes
1 answer

How do I add a private python package to cloud composer's requirements?

I'm trying to install a private python package in the Google Cloud composer environment. I usually install the package using a personal access token. That works with usual pip. pip install git+https://$TOKEN@github.com/org/repo.git@main works as…
3
votes
1 answer

How do I import Airflow operators for version 2.2.5?

I have just upgraded my Airflow to 2.2.5 and I can't use the EmptyOperator. It should be simple from airflow.operators.empty import EmptyOperatorbut I get the error ModuleNotFoundError: No module named 'airflow.operators.empty'. I also tried: from…
CClarke
  • 503
  • 7
  • 18
3
votes
1 answer

Trigger Cloud Composer Using Google Cloud Function

I have ran this exact code below but get an error in when attempting to trigger the dag using a cloud function. The error and code are described below: gcs-dag-trigger-function 8bhxprce8hze Traceback (most recent call last): File…
3
votes
2 answers

dbt and google cloud composer PyPI dependency issues

I am currently running Google Cloud Composer with a Composer version 2.0.9 and airflow version 2.1.4. I am trying install the most recent version of dbt (1.0.4 for core and 1.0.0 for the BigQuery plugin). Because cloud composter images has…
dko512
  • 411
  • 4
  • 15
3
votes
2 answers

Efficiently set task_concurrency for all tasks in a DAG in airflow

My requirement: I want to avoid overlapping task runs of the same task in airflow 2.1.4. The following run of a task should only start, after its preceding task_run finished (successfully or error is both ok). I found this comprehensive answer, but…
Chris
  • 31
  • 2
3
votes
2 answers

Apache Airflow/Composer : how to connect to https using http connector with untrusted certificate

I am looking to connect to an external API through HTTPS with Airflow. To do that, I configure my http Airflow connector regarding the documentation. I setted my host with my url : myurl.com Then i setted the schema value to 'https' as expected in…
3
votes
1 answer

GCP Composer - ftplib timeouterror errno 110 connection timed out

I trying to getting data from FTP server's txt file by GCP Composer Tasks. So i imported and used ftplib package in code. like this. ftp = FTP() ftp.connect(host=HOST,port=PORT,…
박현균
  • 59
  • 1
  • 5
3
votes
2 answers

Install GPU Driver on autoscaling Node in GKE (Cloud Composer)

I'm running a google cloud composer GKE cluster. I have a default node pool of 3 normal CPU nodes and one nodepool with a GPU node. The GPU nodepool has autoscaling activated. I want to run a script inside a docker container on that GPU node. For…
3
votes
0 answers

BigQuery operator - query refresh - Airflow

We are using Airflow 2.1.4 via Google Cloud composer and are referencing our queries via the "BigQueryInsertJobOperator" and for the query we are referencing a path on the Composer GCS bucket (ie "query" : "{% include ' ...). This works fine except…