2

we are looking to get some clarity on Cloud Data Fusion pricing. It looks like if we create a Cloud Data Fusion instance, as long as the instance is alive, we incur hourly rates. This can be quite high: $1100 per month for development and $3000 per enterprise instance. https://cloud.google.com/data-fusion/pricing There seems to be no way to stop an instance - this was confirmed by support, only delete.

However, the pricing talks of development vs execution. Wondering if we can avoid the instance charges once we are done deploying a pipeline. Not clear if this is possible or even a deployed pipeline requires an instance.

Thanks.

sacoder
  • 159
  • 13
  • Let me know if I understood correctly: you want to have an instance that is on only when your pipeline have to be run? – rmesteves Feb 11 '21 at 14:37
  • Yes, or an alternative way to create and deploy pipelines without having to incur the instance charges once the development is over. Looks like I can accomplish this by developing the pipeline and exporting it and running it on an independent dataproc cluster. It would be great if an example is available on the steps. Thanks! – sacoder Feb 12 '21 at 13:54
  • Unfortunately you can not just turn off your DataFusion instance. If you want to avoid costs and you dont need an instance that is turned on all the time, I suggest that you use a Dataproc cluster and orchestrate your process with other product like Composer. – rmesteves Feb 15 '21 at 16:49

1 Answers1

1

You can deploy your pipeline in 2 modes:

  • Either Cloud Data fusion create an ephemeral cluster, deploy your pipeline and tear down the cluster at the end -> Here you need to keep Data Fusion to tear down the cluster. But you can delete it before
  • Or run the pipeline on an existing cluster. This time, after the pipeline deployment and start, you can shut down the instance.

I agree, it's not clear but you can deduce this when you know how work an Hadoop cluster.

Note: don't forget to export your pipeline before deleting the instance

Note2: the instance also offer trigger scheduling to run the pipeline. Of course, if you delete the instance this feature is useless for you!

guillaume blaquiere
  • 66,369
  • 2
  • 47
  • 76
  • Thanks for this, you are right this is not fully clear, but makes some sense. So essentially it is possible to develop the pipeline in fusion and then export and run it under dataproc independently? Could you please share if this use case / tutorial is available somewhere? – sacoder Feb 12 '21 at 13:55
  • @sacoder have you found a workaround in the end or are you running the data fusion instance all the time? Thanks! – Marigold Feb 04 '22 at 07:51
  • 1
    @Marigold No we did not, we are no longer using Fusion, now we run our pipelines on Cloud composer and Data Flow. – sacoder Feb 05 '22 at 08:04