0

We have a program that fetches the content of a Drive and downloads the files by using the "Export Links" property returned by the API.

To download the file, we use the MediaDownloader class and the request is authenticated (OAuth2).

Everything works in the majority of time. The program does what we expect.

However, we experiment a strange issue with some Google Spreadsheets. Randomly, when we download a Google Spreadsheet via the Export Link (xlsx format). Instead of receiving an Excel file, we got an HTML page with a warning:

Title: Too Many Requests

Content: Wow, this file is very popular! It might be unavailable until the crowd clears.

As I said, it randomly fails. And this is not always the same files that fail...

Since it doesn't return an error, it's difficult to handle this case programmatically and retry the download for example.

Five things that we know:

  • This problem occurs only Google Spreadsheets and it's a random problem.
  • It seems that it's the old format for spreadsheets according to the URL of these files: Why are there two different URL formats for Google spreadsheet documents?.
  • We don't receive any exception or error when we download the file.
  • This file is not used by several users. We try it at different times of the day. No one is using the file at same time.
  • Our program fetches all the content and makes about 5-8 downloads simultaneously per second (download different files).

Does anyone have any idea about this problem?

Community
  • 1
  • 1
mtheriault
  • 1,065
  • 9
  • 21
  • Google flow control isn't great, so if it's possible to stagger your 5-8 downloads per second, that might help. Alternatively, over the last week there have been a number of Drive issues, so if your problem is recent, it might just be another manifestation of the timeouts and 500s that people are seeing. – pinoyyid Sep 26 '14 at 22:29
  • OK thanks, we tried to stagger the downloads without success... It continues to randomly fail only on some Google Spreadsheets. We have this issue since the beginning of September... a specific spreadsheet can fail and few minutes later, we are able to export and download the spreadsheet without any error. It's very strange. – mtheriault Oct 02 '14 at 18:44
  • sounds like general Drive flakiness. This is prob one of those situations where an exponential backoff and retry would help. I have a hunch that many 500s are related to internal timeouts, so it might be exacerbated by the size, complexity of the spreadsheet causing the conversion to timeout. – pinoyyid Oct 02 '14 at 18:50
  • Yeah, I would like to implement an exponential backoff and retry for this case, but we don't receive any error code or exception. When we ask to download the document at the Export Link, instead of receiving the binary data, we receive an HTML page with a warning message: "Wow, this file is very popular! It might be unavailable until the crowd clears.". We have implemented retries everywhere, but we are unable to handle this case... – mtheriault Oct 02 '14 at 18:54
  • In fact, we don't want to have to check the content of each processed document and search for the warning! :) – mtheriault Oct 02 '14 at 18:58

0 Answers0