5

Backround

The service is a simple Go program that pipes a file from Cloud Storage to the browser.

Everything works fine on my Macbook, but fails on Cloud-Run (managed) for some requests. Mostly large mp4 files.

Problem

The logs just show a 500 status, as does the browser. But my service doesn't log anything other than starting to copy the file. No IO errors or anything.

This message is shown 4 seconds before the 500 status:

Container Sandbox Limitation: Unsupported syscall membarrier(0x10,0x0,0x0,0x8,0x775dce0b030,0x775dce0b000). Please, refer to https://gvisor.dev/c/linux/amd64/membarrier for more information.

I cannot reproduce this locally. Works fine locally with the same configuration and GCP buckets.

The service works fine on Cloud-Run with smaller files, like images. Just not the videos I've tried.

I've tried

  • Logging everything up to the io.Copy. No errors, hangs afte io.Copy is called.
  • Increasing the memory of the container. It's now running a 1G. No change from 512M.
  • Running in a Docker container locally with the same configuration, same credentials. No problems.
  • Reaching out to GCP on Twitter

Update 2019-08-16

I created a very simple service that prints 'A' to a http responsewriter. It also works perfectly locally, yet returns 500 on cloud-run with larg-ish sizes. 1MB OK, 5MB OK, 50 MB fails, 100MB fails, etc. There are no membarrier messages when this service runs.

Code is available here: https://github.com/andrioid/reproduce-cloud-run-bug

Reported on issue-tracker as well: https://issuetracker.google.com/issues/139511257

Update 2: Probable cause

Seems like there is a hard limit on response sizes to 32MB.

https://cloud.google.com/run/quotas

Very disappointing that this cannot be increased and that the error doesn't mention this limit, neither does the log file.

Dustin Ingram
  • 20,502
  • 7
  • 59
  • 82
Andrioid
  • 3,362
  • 4
  • 27
  • 31
  • "Running in a Docker container locally with the same configuration, same credentials. No problems." Did you run it with gVisor? because it's an unsupported syscall membarrier on linux/amd64 gvisor – Vitaly Migunov Aug 15 '19 at 07:50
  • No I haven't tried with gvisor. Do you know how to install it on Mac? I'm not even sure that membarrier is causing this. – Andrioid Aug 15 '19 at 09:55
  • 1
    I experimented similar issue with 3rd party binaries. Open an issue to support with code sample. It will help you and investigate on the issue. You can try to use appengine flex (it worked fort me), but it don't scale to 0. – guillaume blaquiere Aug 15 '19 at 10:05
  • Did anyone found the same error and was able to debug something else apart from that quota limitation? I'm not doing anything with large files and I can't debug the reason for that syscall to being called. – BBerastegui Oct 23 '19 at 23:18

3 Answers3

0

There is an outstanding issue at https://github.com/google/gvisor/issues/267 to implement membarrier, but for now this is not allowed by the container sandbox.

Dustin Ingram
  • 20,502
  • 7
  • 59
  • 82
  • Do you know what the consequences of it not being implemented? Stackdriver reports it as DEBUG. Does anyone know what in my Go code could be calling this syscall? The only dependency I have is gocloud.dev. – Andrioid Aug 16 '19 at 07:07
0

Note that you can always report issues at Google Cloud official issue trackers. https://cloud.google.com/support/docs/issue-trackers.

In most cases, unimplemented system calls in gVisor don't cause crashes in the application (as most languages use fallbacks by using more primitive or legacy syscalls).

I'd recommend following the issue linked at the other answer and reply with saying you hit this on Cloud Run, and ideally provide a small program hitting this case. Such issues are often fixed within a few weeks depending on the release cycles.

It doesn't appear like Go is doing this syscall in its high level code [1] but it might be simply that the low-level Go runtime code written in assembly is causing this.

ahmet alp balkan
  • 42,679
  • 38
  • 138
  • 214
0

The 32 MB limit for HTTP Request and Response is not a Cloud Run limitation, this is a limitation of the GFE (Global Frontend Service) that sits in front of Cloud Run Managed.

Note: I am not including Cloud Run on Kubernetes in this answer, only Cloud Run Managed.

The GFE is a reverse proxy that terminates TCP connections. The GFE provides additional features to Cloud Run such as public IP hosting of its public DNS name, Denial of Service (DoS) protection, and TLS termination.

The GFE is used for many Google services and for this reason, I doubt that this limitation will be changed in the near future.

John Hanley
  • 74,467
  • 6
  • 95
  • 159
  • That explains why the 500 error was coming from something called "Google Frontend". Thanks for the info. I think I can work around the issue by redirecting to storage-links instead of piping it directly. Anything to not wrangle servers myself :-) – Andrioid Aug 17 '19 at 14:05