Questions tagged [google-cloud-composer]

Google Cloud Composer is a fully managed workflow orchestration service, built on Apache Airflow, that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.

Cloud Composer is a product of Google Cloud Platform (GCP). It is essentially "hosted/managed Apache Airflow."

The product allows you to create, schedule, and monitor jobs, each job being represented as a DAG (directed acyclic graph) of various operators. You can use Airflow's built-in operator definitions and/or or define your own in pure Python.

While technically you can do data processing directly within a task (instantiated operator), more often you will want a task to invoke some sort of processing in another system (which could be anything - a container, BigQuery, Spark, etc). Often you will then wait for that processing to complete using an Airflow sensor operator, possibly launch further dependent tasks, etc.

While Cloud Composer is managed, you can apply a variety of customizations, such as specifying which pip modules to install, hardware configurations, environment variables, etc. Cloud Composer allows overriding some but not all Airflow configuration settings.

Further technical details: Cloud Composer will create a Kubernetes cluster for each Airflow environment you create. This is where your tasks will be run, but you don't have to manage it. You will place your code within a specific bucket in Cloud Storage, and Cloud Composer will sync it from there.

1225 questions
5
votes
1 answer

How to use airflow DataFlowPythonOperator for beam pipeline?

Before using DataFlowPythonOperator, I was using airflow's BashOperator. It was working fine. My beam pipeline required a certain argument, here is the command which I used in BashOperator. Just for info - This beam pipeline is for converting CSV…
5
votes
2 answers

Airflow export all tables of a postgres DB to BigQuery

I'm currently using Airflow PostgresToGoogleCloudStorageOperator and GoogleCloudStorageToBigQueryOperator to export every table of my Postgres DB (hosted on AWS RDS) to BigQuery. It works but I have 75 tables, so Airflow creates 75 * 2 jobs. Since…
5
votes
1 answer

How to attach a file using email operator in Airflow

I've used the argument files =["abc.txt"]. I got the info from the airflow docs...https://airflow.readthedocs.io/en/stable/_modules/airflow/operators/email_operator.html But I'm getting the error that the file is not found. My question is from where…
5
votes
2 answers

Any success story installing private dependency on GCP Composer Airflow?

Background info Normally within a container environment I can easily install my private dependency with a requirements.txt like this: --index-url https://user:pass@some_repo.jfrog.io/some_repo/api/pypi/pypi/simple some-private-lib The package…
xiaolong
  • 3,396
  • 4
  • 31
  • 46
5
votes
2 answers

KubernetesPodOperator privileged security_context in Airflow

I am running Airflow on Google's Cloud Composer. I am using the KubernetesPodOperator and would like to mount a google storage bucket to a directory in pod via gcsfuse. It seems like to do this I need to give k8s privileged security context as…
5
votes
3 answers

How to pass mount a single file as volume to KubernetesPodOperator?

I have a docker image that expects a mounted JSON credentials file on startup. The container is started through a command like: docker run -v [CREDENTIALS_FILE]:/credentials.json image_name This image lives on Google Container Registry and I'd like…
5
votes
3 answers

GKE autoscaling doesn't scale down

We use GKE (Google Kubernetes Engine) to run Airflow in a GCC (Google Cloude Composer) for our data pipeline. We started out with 6 nodes, and realised that the costs spiked, and we didn't use that much CPU. So we thought that we could lower the…
5
votes
1 answer

Creating a sidecar with KubernetesPodOperator

I am looking to create a sidecar container while using KubernetesPodOperator. I am seeing options to create init container with pod_mutation_hook but I am not seeing an option to create a sidecar. If I create a init container that has to complete…
sabs6488
  • 447
  • 1
  • 4
  • 14
5
votes
4 answers

Task fails due to not being able to read log file

Composer is failing a task due to it not being able to read a log file, it's complaining about incorrect encoding. Here's the log that appears in the UI: *** Unable to read remote log from…
Andres Lowrie
  • 170
  • 3
  • 10
5
votes
3 answers

Google Cloud Composer taking too long to install dependencies

I'm following the documentation for Google Cloud Composer to install Python dependencies from PyPI in an environment. I used this command to install the libraries from a requirements file: $ gcloud composer environments update $ENV_NAME \ …
5
votes
1 answer

Disable new DAGs by default

Is there a setting in Cloud Composer / Airflow that can disable new DAGs in the DAGs folder by default, without the need for specifying this in the DAG files themsleves? I want to be able to load these DAGs in to a development environment where…
felix
  • 97
  • 9
5
votes
2 answers

Airflow SFTPHook - No hostkey for host found

I'm trying to use the Airflow SFTPHook by passing in a ssh_conn_id and I'm getting an error: No hostkey for host myhostname found. Using the SFTPOperator with the same ssh_conn_id however is working fine. How can I resolve this error?
Cookie Monster
  • 475
  • 6
  • 12
5
votes
1 answer

Import variables using json file in Google Cloud Composer

How can I import a json file into Google Cloud Composer using command line? I tried the below command gcloud composer environments run comp-env --location=us-central1 variables -- --import composer_variables.json I am getting the below…
sag
  • 5,333
  • 8
  • 54
  • 91
5
votes
2 answers

Google Cloud Composer The server encountered a temporary error and could not complete your request

After running for a couple of days Google Cloud Composer web UI returns the 502 Server Error indefinitely: Error: Server Error The server encountered a temporary error and could not complete your request. Please try again in 30 seconds. The only…
medvedev1088
  • 3,645
  • 24
  • 42
5
votes
2 answers

How to get Airflow db credentials from Google Cloud Composer

I am in current need of Airflow db connection credentials for my Airflow instance in Cloud Composer. All I see on Airflow connection UI is airflow_db mysql airflow-sqlproxy-service. I would like to connect to it via DataGrip. Another thing is if I…
Plengo
  • 97
  • 1
  • 10