Questions tagged [google-cloud-composer]

Google Cloud Composer is a fully managed workflow orchestration service, built on Apache Airflow, that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.

Cloud Composer is a product of Google Cloud Platform (GCP). It is essentially "hosted/managed Apache Airflow."

The product allows you to create, schedule, and monitor jobs, each job being represented as a DAG (directed acyclic graph) of various operators. You can use Airflow's built-in operator definitions and/or or define your own in pure Python.

While technically you can do data processing directly within a task (instantiated operator), more often you will want a task to invoke some sort of processing in another system (which could be anything - a container, BigQuery, Spark, etc). Often you will then wait for that processing to complete using an Airflow sensor operator, possibly launch further dependent tasks, etc.

While Cloud Composer is managed, you can apply a variety of customizations, such as specifying which pip modules to install, hardware configurations, environment variables, etc. Cloud Composer allows overriding some but not all Airflow configuration settings.

Further technical details: Cloud Composer will create a Kubernetes cluster for each Airflow environment you create. This is where your tasks will be run, but you don't have to manage it. You will place your code within a specific bucket in Cloud Storage, and Cloud Composer will sync it from there.

1225 questions
0
votes
2 answers

Getting *** Task instance did not exist in the DB as error when running gcs_to_bq in composer

While executing the following python script using cloud-composer, I get *** Task instance did not exist in the DB under the gcs2bq task Log in Airflow Code: import datetime import os import csv import pandas as pd import pip from airflow import…
Gaurav Taneja
  • 1,084
  • 1
  • 8
  • 19
0
votes
1 answer

write a file from a docker container in google-cloud-composer

Some context: I'm using composer-1.3.0-airflow-1.10.0 Installed PyPi package docker===2.7.0 For a while I tried to use the DockerOperator, but I need to pull images from a private gcr.io registry located in another gcp-project, and that is a…
donkino
  • 110
  • 1
  • 9
0
votes
1 answer

Cloud Composer throwing InvalidToken after adding another node

I recently added a few new DAGs to production airflow and as a result decided to scale up the number of nodes in the Composer pool. After doing so I got the error: Can't decrypt _val for key=, invalid token or value. This happens now for every…
0
votes
0 answers

JVM not found on cloud composer webserver

We are using JDBC Connection in airflow to fetch data from a public MySQL server. The connection was setup fine and worked as expected in the local machine but I received the following error in composer. [Errno 2] No such file or directory:…
Tameem
  • 408
  • 7
  • 19
0
votes
2 answers

airflow DAG keeps retrying without showing any errors

I use google composer. I have a dag that uses the panda.read_csv() function to read a .csv.gz file. The DAG keeps trying without showing any errors. Here is the airflow log: *** Reading remote log from…
MT467
  • 668
  • 2
  • 15
  • 31
0
votes
1 answer

how to install dask on google composer

I tried to install dask on google composer (airflow). I used pypi (GCP UI) to add dask and the below required packages(not sure if all the google one are required though, couldn't find requirement.txt): dask toolz partd cloudpickle …
MT467
  • 668
  • 2
  • 15
  • 31
0
votes
1 answer

No module named pymssql when using MsSqlOperator

I'm using Composer version 1.2.0-1.9.0, and I'm trying to use a MsSqlOperator in one of my DAGs. However, when published, Airflow gave me the error: No module named 'pymssql'. Now, I could install it as a PyPi package, but shouldn't it be supported…
Lucas Rosa
  • 33
  • 4
0
votes
1 answer

How to share resources(compute engines) among projects in google cloud platform

I am trying to create prototype, where I can share the resources among the projects to run a job within the google cloud platform Motivation: Let say there are two projects: Project A and Project B. I want to use the dataproc cluster created in…
0
votes
1 answer

General availability of Google Cloud Composer on Europe-West-3

is there anybody who knows when google-cloud-composer will be available on europe-west-3 ? Thanks a lot for your help,
0
votes
1 answer

Can we implement data lineage on queries run via Google BigQuery?

Could anyone help me in providing some pointers on how do we implement Data lineage on a DW type solution built on Google BigQuery using Google Cloud storage as source and Google Cloud Composer as the workflow manager to implement a series of SQL's?
0
votes
1 answer

Rate limited API requests in Cloud Composer

I'm planning a project whereby I'd be hitting the (rate-limited) Reddit API and storing data in GCS and BigQuery. Initially, Cloud Functions would be the choice, but I'd have to create a Datastore implementation to manage the "pseudo" queue of…
0
votes
1 answer

Install hadoopy in google composer

I am using google composer. How can we install hadoopy in google composer environment. This page has steps for installing hadoopy in Linux machine Github Clone git clone https://github.com/bwhite/hadoopy.git cd hadoopy sudo python setup.py…
bob
  • 4,595
  • 2
  • 25
  • 35
0
votes
1 answer

Cloud Composer - Get google user

There is a way to get the google account name running the DAGs from the DAG definition? This will be very helpful to track which users was running the DAGs. I'm only see : unixname --> always airflow owner --> fixed in the dag…
edduuar
  • 21
  • 4
0
votes
1 answer

Google Cloud Composer with regional kubernetes cluster

I'm planning a DR plan in case of zone failures in GCP. Currently, Composer runs in a single zone. Is there a way to have its Kubernetes cluster regional?
Bruno
  • 182
  • 2
  • 12
0
votes
1 answer

Cloud Composer non interactive authentication

I've tried a lot before posting this question, I'm not against down votes, at least let me know WHY you are down voting. I have built an Airflow plugin to fetch data from Cloud composer Airflow environment and accessing the cloud composer is working…