1

I am trying to run a Vertex AI Pipeline.

The pipeline is successfully created PipelineJob created. Resource name: XXX

then i am getting a PipelineState.PIPELINE_STATE_PENDING multiples times until it crashes with this error :

Traceback (most recent call last):
  File "/src/pipelines/build_model/pipeline_run.py", line 288, in <module>
    cli()
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/src/pipelines/build_model/pipeline_run.py", line 284, in cli
    job.run()
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py", line 314, in run
    self._run(
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/google/cloud/aiplatform/base.py", line 810, in wrapper
    return method(*args, **kwargs)
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py", line 351, in _run
    self._block_until_complete()
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py", line 499, in _block_until_complete
    raise RuntimeError("Job failed with:\n%s" % self._gca_resource.error)
RuntimeError: Job failed with:
code: 13
message: "Internal error encountered. Please try again"

This pipeline currently works in a dev gcp project, it automatically get into a RUNNING state.

I have this issue when i try to make it works in another gcp project. I have reproduced the same step (API enabled, service account created, same rights, same location), in my code i just change the project_id and credentials.

I have tried to change the location to check it is not due to a lack of ressource on google side. Also checked a really simple Hello World Pipeline and can't make the Pipeline go into the Running state.

I also have checked Cloud logging but can't find anything useful.

Any ideas? Thanks

L.GAYET
  • 86
  • 6
  • Internal errors are mainly due to system errors, they are mostly transient. But since these are not very descriptive I would advise to open a [support ticket](https://www.google.com/aclk?sa=l&ai=DChcSEwiy2Yjz1uz-AhVjmmYCHcOWC9EYABABGgJzbQ&sig=AOD64_2jnoaj-Kt3pj5MUKzCPSajdYF0DA&adurl&ved=2ahUKEwj1xYLz1uz-AhVV-DgGHRlkDDgQqyQoAHoECAgQCw) with GCP or create a issue thread in GCP [public issue tracker](https://cloud.google.com/support/docs/issue-trackers) to get a precise issue description and solution. – Sakshi Gatyan May 11 '23 at 06:53
  • Don't you find it weird that the pipeline doesn't even start? How can they be a system error if no node is executed? – L.GAYET May 11 '23 at 07:49

2 Answers2

1

I finally found out what was missing. It was some IAM permissions (concerning Cloud Storage and Bigquery in my case)

L.GAYET
  • 86
  • 6
0

I got this error using a GCS bucket in a different region than the region my pipeline ran in.

Roy van Santen
  • 2,361
  • 3
  • 10
  • 11