3

I am using vertex-ai endpoints to serve a deep learning service.

My service takes approximately 30s - 2 minutes to respond on CPU depending on the size of the input. I noticed that when the input size takes more than one minute to respond, the API fails, giving me this error:

<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 502 (Server Error)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>502.</b> <ins>That’s an error.</ins>
  <p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.  <ins>That’s all we know.</ins>

When I retry, I keep getting the same error. Once I decrease the input size, the API starts working again. For these reasons, I believe this is a timeout issue.

So my question is: how can I change the timeout value in vertex-ai endpoints? I read through all the documentation, and it doesn't seem to be mentioned anywhere.

Thank you.

user297904
  • 417
  • 1
  • 4
  • 12
  • The answer from @Ricky Nguyen seems to be correct, although Vertex AI endpoints is still in [pre-GA](https://cloud.google.com/terms/service-terms#1) phase with product limited support, I encourage you to report this issue on Google [issue tracker](https://cloud.google.com/support/docs/issue-trackers) making the problem visible to the developers with more chance to improve the product functionality in the future. – Nick_Kh Oct 15 '21 at 06:28

1 Answers1

3

There is an upper limit on the timeout of about 60s plus some extra overhead. So anything approaching 2m is definitely the reason why you're getting that error. It also isn't configurable.

Are there ways to speed up the model serving overhead? Such as deploying on faster hardware, other model optimizations? If you're running a custom container, perhaps take advantage of more cores, reduce any external dependencies