10

My company is evaluating if we can use Google Dataflow.

I have run a dataflow on Google Cloud Platform. The console shows 5 hr 25 minutes in "Reserved CPU Time" field on the right.

Worker configuration: n1-standard-4

Starting 8 workers...

How to calculate the cost of the dataflow ? According to this page the price is $0.01 per GCEU per hr, how can I find the number of GCEU consumed by my dataflow, and the number of hours?

Cloud Dataflow Console

Community
  • 1
  • 1
Sergey Grigoriev
  • 709
  • 7
  • 15

3 Answers3

11

You can find the number of GCEUs per machine here: https://cloud.google.com/compute/docs/machine-types. For example, n1-standard-4s are 11 GCEUs.

The cost of a batch Dataflow job (in addition to the raw cost of VMs) is then

(Reserved CPU time in hours) / (Cores per machine) * (GCEUs) * $.01

Then, the total cost of the job is

(machine hours) * ((GCEUs) * $.01 + (machine cost per hour) + (PD cost per hour for attached disks))

For example, for n1-standard-4 with 250GB disks, this works out to (11 * $.01 + $.152 + ($.04 * 250 / 30 / 24)) = $.276 per machine-hour.

danielm
  • 3,000
  • 10
  • 15
  • Thank you, could you extend your equation and include calculation of the raw cost of VMs to produce the total ? I need the total cost of the dataflow. – Sergey Grigoriev Jan 14 '16 at 20:00
  • There is a new pricing model for Dataflow since 2018-05-03. See post below. – antono Apr 23 '18 at 08:58
0

There is new pricing model for Dataflow since 2018-05-03.

Now you should use following formula:

(vcpu_hours * vcpu_hourly_price) +
(mem_hours  * mem_hourly_price) +
(disk_hours * disk_hourly_price)

Additional costs for Shuffle may apply.

antono
  • 978
  • 1
  • 7
  • 18
0

If you enable billing export to BigQuery it's possible and easy to compute the cost of a single Dataflow job with the query below filling in the correct values for GCP_PROJECT, BILLING_TABLE_NAME and DATAFLOW_JOB_ID. The query is:

SELECT
  l.value AS job_id,
  ROUND(SUM(cost),3) AS cost 
FROM `$GCP_PROJECT.$BILLING_TABLE_NAME` bill, UNNEST(bill.labels) l
WHERE service.description = 'Cloud Dataflow' and l.value = `$DATAFLOW_JOB_ID`
GROUP BY 1;

You can find the value for DATAFLOW_JOB_ID in the Dataflow UI and BILLING_TABLE_NAME in the BigQuery UI. The BILLING_TABLE_NAME will be of the format gcp_billing_export_resource_$ACCOUNT_ID

NOTE: From personal experience it seems to take quite a while before the billing table is populated with the pricing information.

DzedCPT
  • 133
  • 6