0

I have a script based on a python client library from GCP that is meant to provision clusters and submit jobs to them. When I run the script, it successfully uploads files to google storage, creates a cluster, and submits a job. The error comes in when it's running my "wait_for_job()" function, as implied by the following:

    Waiting for job to finish...
    Traceback (most recent call last):
       File "/Users/cdastmalchi/WGS_automation_python_SDK.py", line 
          174, in <module> main()
       File "/Users/cdastmalchi/WGS_automation_python_SDK.py", line 
          168, in main wait_for_job(dataproc, args.project_id, 
          region, args.cluster_name)
       File "/Users/cdastmalchi/WGS_automation_python_SDK.py", line 
          132, in wait_for_job
          jobId=job_id).execute()
       File "/anaconda/lib/python2.7/site-
          packages/oauth2client/util.py", line 137, in 
          positional_wrapper
          return wrapped(*args, **kwargs)
       File "/anaconda/lib/python2.7/site-
          packages/googleapiclient/http.py", line 842, in execute
          raise HttpError(resp, content, uri=self.uri)
    googleapiclient.errors.HttpError: <HttpError 404 when requesting 
    https://dataproc.googleapis.com/v1/projects/my-
    project/regions/us-east4/jobs/my-cluster?alt=json returned "Job 
    not found my-project/my-cluster"> 

Here is my wait_for_job() function:

    def wait_for_job(dataproc, project, region, job_id):
        print('Waiting for job to finish...')
        while True:
           result = dataproc.projects().regions().jobs().get(
              projectId=project,
              region=region,
              jobId=job_id).execute()
           # Handle exceptions
          if result['status']['state'] == 'ERROR':
              raise Exception(result['status']['details'])
          elif result['status']['state'] == 'DONE':
              print('Job finished.')
              return result

Here is my create_cluster() function:

    def create_cluster(dataproc, project, zone, region, cluster_name, master_type, worker_type):
        print('Creating cluster...')
        zone_uri = \
 'https://www.googleapis.com/compute/v1/projects/{}/zones/{}'.format(
        project, zone)
        cluster_data = {
            'projectId': project,
            'clusterName': cluster_name,
            'config': {
                'gceClusterConfig': {
                      'zoneUri': zone_uri,
                 },
                'masterConfig': { 
                      'machineTypeUri' : master_type,
                 },
                'workerConfig': {
                      'machineTypeUri' : worker_type,
                 },
             }
          }

        result = dataproc.projects().regions().clusters().create(
             projectId=project,
             region=region,
             body=cluster_data).execute()
        return result

Do you think the problem has to do with regions/zones? my cluster is in us-east4-b and the attempted job submission was in us-east4.

claudiadast
  • 419
  • 1
  • 9
  • 18
  • 'jobs/my-cluster' looks highly suspect. Are you sure your job ID is 'my-cluster'? – tix Sep 20 '17 at 17:44
  • @tix: yes that part seems fishy. My Job ID is not 'my-cluster'. 'my-cluster' is the cluster name. I will include the `create_cluster()` function as well in case that helps catch the problem. – claudiadast Sep 20 '17 at 18:32
  • I'd trace how wait_for_job() is invoked, specifically where "job_id" value comes from (maybe log all the arguments) – tix Sep 20 '17 at 21:14

1 Answers1

1

Your error message shows that your code is passing in args.cluster_name to wait_for_job, when your method signature for wait_for_job expects a jobid in the last argument, not a cluster_name:

   File "/Users/cdastmalchi/WGS_automation_python_SDK.py", line 
      168, in main wait_for_job(dataproc, args.project_id, 
      region, args.cluster_name)

You need to change that argument to be your jobid instead.

Dennis Huo
  • 10,517
  • 27
  • 43