3

I've been trying Composer recently to run my pipeline, and found it cost surprisingly high than I thought, here is what I got from the bill:

Cloud Composer Cloud Composer vCPU time in South Carolina: 148.749 hours 
[Currency conversion: USD to AUD using rate 1.475] A$17.11
Cloud Composer Cloud Composer SQL vCPU time in South Carolina: 148.749 hours 
[Currency conversion: USD to AUD using rate 1.475] A$27.43

I only used Composer for two or three days, and definitely not running 24 hours per day, I don't know where the 148 hours come from.

Does that mean after you deploy the dag to composer, even it's not running, it's still using the resource and the composer is accumulating the vCPU time?

How to reduce cost if I want to use Composer to run my pipeline everyday? Thanks.

IanJay
  • 373
  • 4
  • 14

2 Answers2

2

At the moment, as I am aware of, is not a feature of composer yet.

At worker level, you should be able to do this by manually modifing the configuration of the composer and allow its kubernetes workers to scale up and down according to the workload.

Joshua Hendinata made a guide at the following link on the necessary step for enabling autoscaling of Composer [1].

Also perhaps may be of your interet this article where are introduced ways to save on composer costs [2].

Hope this helps you out!

[1] https://medium.com/traveloka-engineering/enabling-autoscaling-in-google-cloud-composer-ac84d3ddd60

[2] https://medium.com/condenastengineering/automating-a-cloud-composer-development-environment-590cb0f4d880

2

Cloud Composer primarily charges for compute resources allocated to an environment, because most of its components continue to run even when there are no DAGs deployed. This is because Airflow is primarily a workflow scheduler, so there's not much you can turn off and expect to be there when a workflow is suddenly ready to run.

In your case, the billed vCPU time is contributed to by your environment's GKE nodes, and your managed Airflow database. Aside from the GKE node count, there's not much you can reduce or turn off, so if you need anything smaller, you may want to consider self-managed Airflow or another platform entirely. Same comment applies if your primary objective is solely processing data and you don't need the scheduling aspect that's offered by Airflow.

hexacyanide
  • 88,222
  • 31
  • 159
  • 162