1

I'm attempting to run a Vertex pipeline (custom model training) which I was able to run successfully in a different project. As far as I'm aware, all the pieces of infrastructure (service accounts, buckets, etc.) are identical.

The error appears in a gray box in the pipeline UI when I click on the model training component and reads the following:

Retryable error reported. System is retrying.
com.google.cloud.ai.platform.common.errors.AiPlatformException: code=ABORTED, message=Specified Execution `etag`: `1662555654045` does not match server `etag`: `1662555533339`, cause=null System is retrying.

I've looked into the log explorer and found that the error logs are audit logs have the following associated tags with them:

protoPayload.methodName="google.cloud.aiplatform.internal.MetadataService.RefreshLineageSubgraph"

protoPayload.resourceName="projects/724306335858/locations/europe-west4/metadataStores/default

Leading me to think that there's an issue with the Vertex Metadatastore or the way my pipeline is using it. The audit logs are automatic though, so I'm not sure.

I've tried purging the metadata store as well as deleting it completely. I've also tried running a different model training pipeline that worked before in a different project as well but with no luck.

screenshot of ui

Jan Zajac
  • 11
  • 3
  • Hi @JanZajac, Can you try rerunning the pipeline and confirm if the issue has been resolved or not? – Prajna Rai T Sep 11 '22 at 07:33
  • Hi @PrajnaRaiT, the pipeline now runs fine... but I was experiencing the issue in my post for a few days. Do you know what happened over the weekend that would have fixed this? – Jan Zajac Sep 12 '22 at 08:24
  • It appears to be a temporary issue, the issue is resolved now. So you should now be able to rerun the pipeline and it is not expected to enter the infinite retry loop. – Prajna Rai T Sep 20 '22 at 14:56

1 Answers1

1

Retryable error which you were getting is the temporary issue, the issue is resolved now.

You can now be able to rerun the pipeline and it is not expected to enter the infinite retry loop.

Prajna Rai T
  • 1,666
  • 3
  • 15