2

At the tope of every minute my code uploads between 20 to 40 files total (from multiple machines, about 5 files in parallel until they are all uploaded) to Google Cloud Storage. I frequently get 429 - Too Many Errors, like the following:

java.io.IOException: Error inserting: bucket: mybucket, object: work/foo/hour/out/2015/08/21/1440191400003-e7ba2b0c-b71b-460a-9095-74f37661ae83/2015-08-21T20-00-00Z/
        at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.wrapException(GoogleCloudStorageImpl.java:1583)
        at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl$3.run(GoogleCloudStorageImpl.java:474)
        ... 3 more
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 429 Too Many Requests
{
  "code" : 429,
  "errors" : [ {
    "domain" : "usageLimits",
    "message" : "The total number of changes to the object mybucket/work/foo/hour/out/2015/08/21/1440191400003-e7ba2b0c-b71b-460a-9095-74f37661ae83/2015-08-21T20-00-00Z/ exceeds the rate limit. Please reduce the rate of create, update, and delete requests.",
    "reason" : "rateLimitExceeded"
  } ],
  "message" : "The total number of changes to the object mybucket/work/foo/hour/out/2015/08/21/1440191400003-e7ba2b0c-b71b-460a-9095-74f37661ae83/2015-08-21T20-00-00Z/ exceeds the rate limit. Please reduce the rate of create, update, and delete requests."
}
        at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145)
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
        at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl$3.run(GoogleCloudStorageImpl.java:471)
        ... 3 more

I have some retry logic, which helps a bit, but even after some exponential backoff and up to 3 retries, I still often get the error.

Strangely, when I go to the Google Developers Console -> APIs & auth -> APIs -> Cloud Storage API -> Quotas, I see Per-user limit 102,406.11 requests/second/user. When I look at the Usage tab, it shows no usage.

What am I missing? How do I stop getting rate limited when uploading files to GCS? Why is my quota so high and my usage reported as 0?

ZachB
  • 13,051
  • 4
  • 61
  • 89
Jon Chase
  • 513
  • 1
  • 7
  • 17

2 Answers2

4

Judging by your description of multiple machines all taking an action at the same moment, I suspect all of your machines are attempting to write exactly the same object name at the same moment. GCS limits the number of writes per second against any one single object (1 per second).

Since it looks like your object names end in a slash, like they're meant to be a directory (work/foo/hour/out/2015/08/21/1440191400003-e7ba2b0c-b71b-460a-9095-74f37661ae83/2015-08-21T20-00-00Z/ ), is it possible you meant to end them with some unique value or a machine name or something but left that bit off?

d-_-b
  • 21,536
  • 40
  • 150
  • 256
Brandon Yarbrough
  • 37,021
  • 23
  • 116
  • 145
  • Interesting - that makes sense about updating the same object repeatedly, but I've verified that the objects I am writing don't end in a slash (e.g. the paths are correctly formed and all are unique). I'm using the Google Cloud Storage Connector for Hadoop to do the writing (it's a Spark job), so I guess it must be doing this. Going to try setting fs.gs.implicit.directory.repair to false and see what happens. – Jon Chase Aug 22 '15 at 01:44
  • It looks like it's GSC. I was able to replicate this issue in a test by writing all files to a path like "gs://${basePath}/${fileName}", getting a similar exception for $basePath (ending in trailing slash like in my question). Making the last bit of the path unique, like "gs://${basePath}/${UUID.randomUuid()}/${fileName}" (generating a new UUID for every file), fixed it. – Jon Chase Aug 22 '15 at 02:13
  • 1
    I was mistaken with my previous comment. Turned out the issue was that I was creating a bunch of new files in the same directory in parallel on many Spark executors (separate JVMs). The GSC driver attempts to create parent directories for files if they don't already exist. The tasks were running in parallel on many machines, so they were all trying to create the same dir in parallel. I resolved it by making sure the parent dir existed by creating it in the Spark driver process before running the parallel file copy tasks on the executors. – Jon Chase Aug 26 '15 at 15:52
  • Ah, this is a known issue and appears to be fixed with the current latest version. See http://stackoverflow.com/questions/31851192/rate-limit-with-apache-spark-gcs-connector/31955475#31955475 – Brandon Yarbrough Aug 26 '15 at 20:41
  • 1
    Thx for the heads up! I was able to get rid of my hack and use the new GCS Jar to fix the issue. – Jon Chase Aug 27 '15 at 17:47
3

That error happens when you attempt to update the same object too frequently. From https://cloud.google.com/storage/docs/concepts-techniques#object-updates:

There is no limit to how quickly you can create or update different objects in a bucket. However, a single particular object can only be updated or overwritten up to once per second.

d-_-b
  • 21,536
  • 40
  • 150
  • 256
Mike Schwartz
  • 11,511
  • 1
  • 33
  • 36