0

I am accessing a Google Docs document giving the URL - DocumentApp.getActiveDocument().getURL() and passing it in URLFetchApp to retrieve the complete HTML content of the page. However, this way it differs from the one I get directly from browser using Inspect Element. I also tried puppeteer to extract the same, however from the screenshot by Puppeteer I realised the html content retrieved is from the same document but without sign in. Please help me in fetching all html content of the page after the document is signed in.

  var username = Session.getActiveUser().getEmail(); 
 var auth = ScriptApp.getOAuthToken(); 
 var url = DocumentApp.getActiveDocument().getUrl();
 var header = {"Authorization": "Bearer " + auth}; 
 var options = { 'method':'get', 'headers':header, 'muteHttpExceptions': true }; 
 var resp = UrlFetchApp.fetch(url, options).getContentText();
 return resp;
ADT
  • 11
  • 4
  • Does this answer your question? [Create a post in Blogger with Google Apps Script](https://stackoverflow.com/questions/57979124/create-a-post-in-blogger-with-google-apps-script) – TheMaster Apr 13 '20 at 05:29
  • You need to send oauth token in the headers – TheMaster Apr 13 '20 at 05:30
  • I am new to Apps Script & Google Services, I have seen several examples on how to pass oauth token in the headers and came up with the code I pasted in my question. Can you please provide a line of code or so showing how to pass it? It will be of great help. – ADT Apr 13 '20 at 15:53
  • 1
    `var header = {"Authorization": "Bearer " + auth)};` – TheMaster Apr 13 '20 at 16:13
  • I am trying to do the same thing: var username = Session.getActiveUser().getEmail(); var auth = ScriptApp.getOAuthToken(); var header = {"Authorization": "Bearer " + auth)}; var options = { 'method':'get', 'headers':header, 'muteHttpExceptions': true }; var resp = UrlFetchApp.fetch(url, options).getContentText(); I don't understand what is the mistake I am doing. Sometimes it throws - CORS policy: Response to preflight request doesn't pass access control check . And sometimes, no error at all but I am getting the same html (without signing in). Please review. thanks – ADT Apr 13 '20 at 23:31
  • 1
    [Edit] your question with the latest script, error messages and provide sample url( change id if you want to hide sensitive details ) – TheMaster Apr 14 '20 at 03:38
  • Also see if any of the answers [here](https://stackoverflow.com/a/28503601/) work out for you. – TheMaster Apr 14 '20 at 03:43
  • Hello @TheMaster, I have updated the code in question. I am getting below error: Access to XMLHttpRequest at 'https://play.google.com/log?format=json&hasfast=true&authuser=0' from origin 'https://docs.google.com' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: The value of the 'Access-Control-Allow-Origin' header in the response must not be the wildcard '*' when the request's credentials mode is 'include'. The credentials mode of requests initiated by the XMLHttpRequest is controlled by the withCredentials attribute. – ADT Apr 14 '20 at 04:35
  • Also, I want to fetch the complete DOM of Google Document. I tried Puppeteer to do so as a workaround, in which I am able to fetch some DOM elements but not all as it is accessing a public document and not signed-in google document. I need to see the HTML in which all the tags (including complete menu list) is available. Thanks for all the help! I would be grateful. – ADT Apr 14 '20 at 04:45
  • Have you tried this https://stackoverflow.com/a/28503601/ – TheMaster Apr 14 '20 at 06:46
  • Yes, I tried that and it is giving me undesired output. Can you please suggest a way to do this in Puppeteer, because it is giving me the output in right format, it's just that the DOM rendered is of a public page (without sign-in). I want Puppeteer to access my document in which I am signed in. – ADT Apr 14 '20 at 16:11
  • Output from the approach you suggested (truncated it to just show you what I got): – ADT Apr 14 '20 at 16:12
  • 1
    **Access to XMLHttpRequest at 'play.google.com/log?format=json&hasfast=true&authuser=0' from origin 'docs.google.com' has been blocked by CORS policy**. This is not from urlfetch right. it's from puppeteer. urlfetch doesn't do cors. What was the output from urlfetch in the code in your question? – TheMaster Apr 14 '20 at 16:15
  • I commented the output just above your comment. Please read. Also, if you can suggest a way to skip authentication/pass credentials in Puppeteer to access my signed-in document, it will resolve my problem. Thanks a lot for your help. – ADT Apr 14 '20 at 16:54
  • Stating my issue again. I am trying to access the complete dynamically rendered DOM for my google document. I tried UrlFetchApp which gives me just static HTML (and not server-side rendered DOM). Then I created a Google Cloud Function in which I used Puppeteer. It gives me the correctly formatted output (dynamically rendered, since it is giving app the browser env), but it is accessing public document (without sign-in), so just providing html tags/properties for some menu lists (and not all). I now need to pass creds/allow Puppeteer to skip auth to access signed-in doc. Please help in this. – ADT Apr 14 '20 at 17:02
  • 1
    It's better to ask a new question by tagging [tag:puppeteer], so that experts there might be able to answer. – TheMaster Apr 14 '20 at 17:56
  • thanks, I will do that. However, can you please direct me to do the same by using Dynamic Rendering in UrlFetchApp. – ADT Apr 14 '20 at 19:24
  • 1
    AFAIK, Dynamic rendering is impossible with UrlFetchApp – TheMaster Apr 14 '20 at 19:39

0 Answers0