3

Trying out the Google Cloud Composer Quickstart in a free trial account, the example workflow DAG's first task runs this operator:

create_dataproc_cluster = dataproc_operator.DataprocClusterCreateOperator(
        task_id='create_dataproc_cluster',
        cluster_name='quickstart-cluster-{{ ds_nodash }}',
        num_workers=2,
        zone=models.Variable.get('gce_zone'),
        master_machine_type='n1-standard-1',
        worker_machine_type='n1-standard-1')

which fails with this error message:

 - Insufficient 'CPUS' quota. Requested 6.0, available 2.0
 - This request exceeds CPU quota. Some things to try: request fewer workers (a minimum of 2 is required), use smaller master and/or worker machine types (such as n1-standard-2).

2 is already the minimum number of worker nodes and n1-standard-1 is already the smallest machine type.

Q. Is there a way to get DataprocClusterCreateOperator() to allocate a higher 'CPUS' Quota?
The Airflow website and Cloud Dataproc Quotas doc are not forthcoming.

Q. Is this a hard limit for a free trial account?
The IAM Quotas console page shows Current Usage as 6, 75%, at 3 of 4 bars, implying that the quota is 8.

Jerry101
  • 12,157
  • 5
  • 44
  • 63
  • 2
    Dataproc starts Compute Engine VMs, so CPU quotas applies to Compute Engine. Quotas are not related to Airflow, `DataprocClusterCreateOperator` just uses Dataproc APIs to start VMs. However, you can request smaller VMs (with fewer CPUs) for your workers (I do not recommend it though, `n1-standard-1` for a Dataproc worker is already small). Anyway, do you have some Compute Engine VMs started? Looks like 6 CPUs are already used in your case. – norbjd May 25 '19 at 10:12
  • 1
    @norbjd That's a key insight! It turns out those 6 CPUs are the 3 `n1-standard-2` GKE nodes created to run the Cloud Composer environment. So if I'd picked `n1-standard-1` for the environment node machine type, the Quickstart example would need 6 CPUs total, which ought to fit in the undocumented quota. [Please do write an Answer and I'll accept it.] I retried the Quickstart within an existing project that has full billing (despite their recommendation) and the Dataproc task ran successfully...but wrote nothing to the output bucket. :-( I don't need Dataproc so I'm not going to debug it. – Jerry101 May 25 '19 at 20:10

1 Answers1

4

Dataproc worker machines are in fact Compute Engine VMs, so CPU quotas applies to Compute Engine API.

CPU quotas are not related to Airflow/Google Cloud Composer and cannot be configured from there. DataprocClusterCreateOperator simply calls Dataproc APIs, which at their turn starts VMs on Compute Engine.

For free trial accounts, CPUs quota seems to be 8, as you experienced. From details you have provided in the comments section, your Composer environment use 6 out of those 8 CPUs (3 * n1-standard-2). Note that you can use smaller machines for Composer (1 CPU), but you'll always need 3 nodes at least. So a minimal Composer environment will use 1 * 3 = 3 CPUs. You can save 3 CPUs here if you want, but Airflow stability could be affected.

You can also request smaller VMs (with fewer CPUs) for your Dataproc workers (or fewer Dataproc workers). Again, I do not recommend it though, because n1-standard-1 (or less) for Dataproc workers is too small.

Note also that with non-free trial accounts, you can request higher quotas. However, in free trial accounts, I think it is not possible.

norbjd
  • 10,166
  • 4
  • 45
  • 80