4

I'm using the GCP Composer API (Airflow) and my DAG to scale up the number of workers keep returning me the error below:

Broken DAG: [/home/airflow/gcs/dags/cluster_scale_workers.py] 'module' object has no attribute 'DataProcClusterScaleOperator' 

Seems to be something related to the ScaleOperator, however when I look at the Airflow Read the Docs and cross check with my code, seems that nothing is wrong. What am I missing?

Is it related to GCP Airflow version?

Code:

import datetime
import os

from airflow import models
from airflow.contrib.operators import dataproc_operator
from airflow.utils import trigger_rule

yesterday = datetime.datetime.combine(
    datetime.datetime.today() - datetime.timedelta(1),
    datetime.datetime.min.time())

default_dag_args = {
    'start_date': yesterday,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': datetime.timedelta(minutes=5),
    'project_id': models.Variable.get('gcp_project'),
    'cluster_name': 'hive-cluster'
}

with models.DAG(
        'scale_workers',
        schedule_interval=datetime.timedelta(days=1),
        default_args=default_dag_args) as dag:

    scale_to_6_workers = dataproc_operator.DataprocClusterScaleOperator(
        task_id='scale_dataproc_cluster_6',
        cluster_name='hive-cluster',
        num_workers=6,
        num_preemptible_workers=3,
        dag=dag
        )
Igor Dvorzhak
  • 4,360
  • 3
  • 17
  • 31
RGregg
  • 135
  • 9
  • 1
    Your code seems absolutely right. Which version of airflow your cloud composer is using? Try enabling BETA option at the top right corner of the cloud composer and check. – Ashish Kumar Dec 13 '18 at 08:52
  • (For Information) if you are not specifying the region of your cluster, by default it will take it as global. – Ashish Kumar Dec 13 '18 at 08:53
  • Thanks @AshishKumar, I noticed it was a version thing with my Airflow. I was using 1.9.0. Once I upgraded to 1.10.0, it worked fine. – RGregg Dec 13 '18 at 09:32

1 Answers1

2

I managed to find the issue and sort it out. The comment provided by Ashish Kumar above is correct.

The problem was that the Airflow version I was using (1.9.0) did not support the DataProcClusterScaleOperator. I created another instance by activating BETA and choosing the version 1.10.0.

Which fixed my issue.

RGregg
  • 135
  • 9