0

I've been trying to scrape a publicly-shared photo album I have on Google Photos: sharing the album provides a link along the format of photos.app.goo.gl/{SOME_ID}. The goal is to be able to retrieve the individual photo URLs (the URLs that don't expire, following the format lh3.googleusercontent.com and which can then be embedded onto any other website within an tag).

I would like to do this in Google Apps Script and have tried the following:

var response = UrlFetchApp.fetch("https://photos.app.goo.gl/{SOME_ID}");
Logger.log(response.getContentText());

However, the response doesn't display any of the images as if the page would instead need to be loaded some more, even though I'm testing this on an album with just 2 photos. If I inspect the page manually, I can clearly see the images + image links along the lh3.googleusercontent.com format. What should I change to my fetch request?

I've seen other implementations (outside of Apps Script) using Axios that managed to get the URLs I want, but haven't found a way of importing the Axios library into Apps Script.

Nimantha
  • 6,405
  • 6
  • 28
  • 69
seb
  • 33
  • 10
  • I am absolutely not shocked that Google lazy-loads media resources. You will need to use the API or some other solution that actually renders the webpage and waits for network activity to settle. – tehhowch Jan 10 '20 at 13:57
  • Oh I'm with you regarding lazy loading and why it makes sense. I just don't know how to do precisely what you are saying, waiting for the network activity to settle – either through fetch or Axios (and if Axios then how to import the library into Apps Script). Thanks again! – seb Jan 10 '20 at 14:04
  • You should link the other solutions you've seen. As far as tell UrlFetch to "wait" on a page, you cannot. As far as using some other request library from Apps Script .gs files, you cannot. You *could* try to use a client-side HTML solution, where you build your own apps-script-hosted webpage and in its JS, can use or reference other libraries via ` – tehhowch Jan 10 '20 at 14:50
  • Sure thing: the two options I have seen are (1) https://medium.com/@ValentinHervieu/how-i-used-google-photos-to-host-my-website-pictures-gallery-d49f037c8e3c (2) https://www.publicalbum.org/blog/embedding-google-photos-albums (see "create embed code") – seb Jan 10 '20 at 15:10

1 Answers1

0

Answer:

You can use the Google Photos API in Apps Script to get the individual photo URLs of a shared album.

More Information:

UrlFetchApp.fetch() returns an HTTPResponse object within Apps Script which contains Headers, HTML Content and other information such as the HTTP response code, as if the page was being fetched and loaded via a browser. There is also a set of URL Fetch limits as detailed on the Quotas for Google Services page which result in you getting a truncated response for sufficiently large pages.

The Photos API however has methods which are specifically designed for the purpose you are describing, and this data can be retrieved from the mediaItems REST resource. Bear in mind however, that this returns all photos and not ones that are in a specific album; further processing would need to be done from there.

Example Code:

After creating a new project in the Developers Console, you need to enable the Photos API from the APIs & Services > Library menu item, and then link it to your Apps Script Project by going to the script UI and following Resources > Cloud Platform project. Add the project number of the project just created in the Developer's Console and press Set Project.

Here is a small code snippet which will return photos from your Google Photos account, and log the individual photo URLs in the Logger:

function logPhotoUrls() { 
  var url = 'https://photoslibrary.googleapis.com/v1/mediaItems'; 
  var options = {
    headers: {
      Authorization: 'Bearer ' + ScriptApp.getOAuthToken()
    },
    method: 'get',
    muteHttpExceptions: false
  };
  
  var response = UrlFetchApp.fetch(url, options);
  for (var i = 0; i < 10; i++) {
    Logger.log(JSON.parse(response.getContentText()).mediaItems[i].productUrl);
  }
}

You will also need to edit your appscript.json, which you can see by following View > Show manifest file to include the following:

{
  "oauthScopes": ["https://www.googleapis.com/auth/photoslibrary.readonly", 
                  "https://www.googleapis.com/auth/script.external_request"]
}

References:

Nimantha
  • 6,405
  • 6
  • 28
  • 69
Rafa Guillermo
  • 14,474
  • 3
  • 18
  • 54
  • This is great, thank you, Rafa. However, that was indeed the first solution I had tried... and which I had to discard it because mediaItems[i].productUrl doesn't provide a URL to the image itself but to the Google Photos container of that image (photos.google.com/lr/album/{some_ID}) – which means the image can't be embedded somewhere else. That's why I opted for that fetch attempt as the lh3.googleusercontent.com/{some_ID} would link to the actual JPGs. Do you see what I am trying to do? – seb Jan 10 '20 at 13:43
  • @seb Use the `baseUrl`? If you review the API reference for `MediaItem`, it indicates this URL can be used to access the actual content. – tehhowch Jan 10 '20 at 14:00
  • @tehhowch Thank you for suggesting but unfortunately baseUrl will expire after 60 minutes, see https://developers.google.com/photos/library/guides/access-media-items#base-urls – seb Jan 10 '20 at 14:02
  • @seb Again, not shocked that Google does not easily allow scripted access to embeddable links. If you want to embed content programmatically, you might need to actually host that content too. – tehhowch Jan 10 '20 at 14:09
  • I don't disagree :). However, I have seen others who have managed to scrape the page to get the individual URLs. Granted it may break in the future. But in the process I'm trying to learn how to fetch a page that doesn't load completely from the get go and do this within Apps Script. Ultimate goal is simply to render my own photos which already exist in Google Photos. Ironically I'm able to (manually) embed them in Google Documents (via the Insert Image option) which I then export as HTML files (via Apps Script) – analysing that page clearly shows those static lh3.googleusercontent.com links. – seb Jan 10 '20 at 14:20