2

While downloading the pdf blob in google drive with UrlFetchApp.fetch method is causing two type of errors:

  1. </div></div>This file might be unavailable right now due to heavy traffic. <a href="">Try again</a>.</div> [Written in downloaded PDF]

  2. Exception: Timeout

Code Snippet:

function downloadasPDF(optSSId, optSheetId)
{
 var ss = (optSSId) ? SpreadsheetApp.openById(optSSId) : SpreadsheetApp.getActiveSpreadsheet();

 var preURL=ss.getUrl() //ss is an spreadsheet reference
 var url = preURL.replace(/edit.*/,'');
 var folder = DriveApp.getFolderById(FolderID);
 // Get array of all sheets in spreadsheet
 var sheets = ss.getSheets();

 for (var i=0; i<sheets.length; i++) {
   //Sheet length is 100+

   Utilities.sleep("5000")
   var sheet = sheets[i];

   // If provided a optSheetId, only save it.
   if (optSheetId && optSheetId !== sheet.getSheetId()) continue; 

   //additional parameters for exporting the sheet as a pdf
   var url_ext = 'export?exportFormat=pdf&format=pdf'   //export as pdf
     + '&gid=' + sheet.getSheetId()   //the sheet's Id
     + '&gridlines=false'  // hide gridlines

   var options = {
     headers: {
       'Authorization': 'Bearer ' +  ScriptApp.getOAuthToken()
     },
     muteHttpExceptions: true,
   }
   var response = UrlFetchApp.fetch(url + url_ext, options);
   var blob = response.getBlob().setName(spreadsheet.getName() + ' - ' + sheet.getName() + '.pdf');
   folder.createFile(blob);
}

To counter above problem I am using:

Utilities.sleep(5000)

But still some files are causing error 1 mentioned above.

Question: Do we have any other better approach to handle two mentioned cases apart from sleep ?

Note: I am using G Suite Enterprise, Number of sheets to download are between 100-150 approx, 240 cells filled for each sheet & rest cells are empty.

Abin Lakhanpal
  • 380
  • 2
  • 15
  • 1
    Does this error happen when trying to download a smaller number of exported spreadsheets? Or do you mean that there is a single spreadsheet with 100-150 sheets and 240 cells filled that causes the error? – Oleg Valter is with Ukraine Jul 11 '20 at 23:55
  • 1
    There's one single google spreadsheet with 100-150 sheets in it. Of course this is less likely to happen if i keep only 10 sheets in one single spreadsheet. I am looking forward for best optimal solution with no error rate or best way to handle it. – Abin Lakhanpal Jul 12 '20 at 00:31
  • 1
    I cannot understand about the relationship between your script in your question and `Number of sheets to download are between 100-150 approx, 240 cells filled & rest empty.`. Your script is used in a loop? If it's so, can I see your current script? – Tanaike Jul 12 '20 at 01:04
  • 1
    Hi @Tanaike - please see OPs comment above - I was also confused by the wording. It seems like there is only one Spreadsheet with 100-150 Sheets. That said, you are probably right about the loop, now that I think about it. Abin, that's a bit strange, 36K cells should not cause significant issues. Is the error consistent or started to happen at some point in time? Can you also check if there is a similarity between the files downloaded? – Oleg Valter is with Ukraine Jul 12 '20 at 01:07
  • 1
    @Oleg Valter Thank you for your support. In that case, I think that the Google Spreadsheet can be exported as a PDF file. If the number of sheets is the reason of OP's issue, I would like to propose to test the several export methods. So as new question, we are required to know the sample endpoint of `url + url_ext`. – Tanaike Jul 12 '20 at 01:11
  • 1
    @Tanaike - and thank you for suggesting the test. Abin? I now think that you are correct about the loop as well - where else one would `sleep` for 5 seconds between call? – Oleg Valter is with Ukraine Jul 12 '20 at 01:15
  • 1
    @OlegValter and Tanaike thanks for raising it up, I updated the code hope it make more sense now. No some of the files download are fine as expected suppose if i download 150 pdf files, out of them 35 files are coming up with html written in it with error 1 as mentioned above in question. url+url_ext is concateneated url. – Abin Lakhanpal Jul 12 '20 at 09:26
  • 2
    @AbinLakhanpal - thank you for updating the question and responding. Could you please check if there are similarities between those sheets that fail to be exported (maybe they are on the heavier side of things)? It is difficult to troubleshoot the `/export` endpoint as it is "semi" public. Please see Tanaike's suggestion about testing different export methods. Though, it's possible that the endpoint has a rate-limit, but AFAIK there is no documentation to tell for sure. Btw, since you are on Enterprise edition, can you initiate a support request to Google as well? Would be interesting to know? – Oleg Valter is with Ukraine Jul 12 '20 at 09:58
  • 1
    Also, inspired by TheMaster's answer - try checking if the response is successful (code 200) on iterations that fail. Because if not, to implement something like an exponential backoff you would need to parse and check the response content – Oleg Valter is with Ukraine Jul 12 '20 at 11:32
  • 1
    Sure Oleg, will raise it up in google forum. Yes right, and potential problem may be time quota for 30 minutes. Let's see. – Abin Lakhanpal Jul 13 '20 at 16:23
  • @AbinLakhanpal - thank you for replying and opening an issue on the tracker - please include the link in the question when you do, so others can star and help visibility ( that said, who knows how long it will take - amount of issues is like an avalanche ). Re:iterations - out of curiousity: did you check if any of the requests return as status code other than `200`? – Oleg Valter is with Ukraine Jul 13 '20 at 21:11

1 Answers1

7

Use a exponential back off function to sleep exponentially on failure. Failure can be checked with .getResponseCode():

var response = (function exponentialBackoff(i) {
  Utilities.sleep(Math.pow(2, i) * 1000);
  let data = UrlFetchApp.fetch(url + url_ext, options);
  if (data.getResponseCode() !== 200) return exponentialBackoff(++i);
  else return data;
})(1);
TheMaster
  • 45,448
  • 6
  • 62
  • 85
  • Thank you,will give try to this code but potential problem here is exponentialBackoff may turn to "Exception Timeout" as time quota for enterprise appscript to run is 30 minutes. – Abin Lakhanpal Jul 13 '20 at 16:20
  • 2
    @AbinLakhanpal Remove your original sleep of 5000 and use only this. – TheMaster Jul 13 '20 at 17:51
  • Sure, will try. – Abin Lakhanpal Jul 13 '20 at 19:12
  • 1
    @AbinLakhanpal - plus you can always expand the utitility to make it exit after several retries. You just need to check if `i === maxNumberOfRetries` after confirmuing that the response code is `!== 200` and exiting early – Oleg Valter is with Ukraine Jul 13 '20 at 21:13
  • This is wonderful, thank you. I'd been rerunning my script for a subset of sheets that I'd have to identify and list manually. My script is more likely to time out now, but that's far easier to handle. – dannypernik Jul 19 '22 at 03:00
  • 1
    In my case, it turns out I do not require the exponential backoff, just a linear one: `Utilities.sleep(i * 1000)` – dannypernik Jul 19 '22 at 05:36
  • @dannypernik Glad to help. If that answers your similar question, shall I close your question as a duplicate? – TheMaster Jul 19 '22 at 08:59
  • @dannypernik OP is using a linear timeout of 5k seconds and it was still failing. While my code does check failure and retry, the point of exponential backoff is actually to reduce the total time waiting. In a series of requests, Implementing exponential back off saves time in the long run than linear backoff. Also, if Google changes the timeout, so that code produces heavy traffic, if done within 1s of previous request(even if the previous request fails([429](https://stackoverflow.com/a/70352469))), your linear code would require constant modification to play cat and mouse game. – TheMaster Jul 19 '22 at 09:14
  • @TheMaster thank you for the explanation. Yes, you can close my question as a duplicate. – dannypernik Jul 20 '22 at 19:02