You could take two approaches and bypass the need to check the URL, if you are using Google Docs exclusively, then you can take out the relevant information from the link and build a link you know is safe.
Use the document ID with Drive API
If you have access to the file you can check it against the Drive API.
For example with a regex like:
/(https:\/\/docs\.google\.com\/)(.+)(\/d\/)(.{44})/
Returning the fourth capture group (.{44})
would give you the document ID. You could then check that against the Drive API, and if you get a 200 response from that, then you would know that the ID is valid. If you get any other response, then you could reject it as invalid.
Then you can replace the given link with one constructed by your script. The way you could construct the link is like this:
let input = <THE LINK GIVEN>
let re = /(^https:\/\/docs\.google\.com\/)(.+)(\/d\/)(.{44})/
let matches = input.match(re)
let service = matches[2]
let id = matches[4]
let url = "https://docs.google.com/" + service + "/d/" + id
For example with a simple HTTP request:
curl \
'https://www.googleapis.com/drive/v3/files/[DOCUMENT_ID]?supportsAllDrives=true&fields=id&key=[YOUR_API_KEY]' \
--header 'Authorization: Bearer [YOUR_ACCESS_TOKEN]' \
--header 'Accept: application/json' \
--compressed
Visit the link from a script
Using server side scripts, you can try to visit the constructed link to see what status code it returns.
Returning the second capture group from the above regex would give you the service (spreadsheets, documents, etc), the fourth capture group would give you the ID.
You could then build the URL with the legitimate format and use an Apps Script project or other server side code to check if the URL returns a 200
(OK) a 403
(Access forbidden) or 404
(Not Found). If it returns the 404
its because the ID doesn't exist, and so it likely to not be legitimate. 403
could mean that it just doesn't have access.
Unfortunately you can't run it from client side JavaScript because of the CORS policy. Any fetch
or XMLHttpRequest
will fail when trying to call these links.
With some server side code like Apps Script it could be something like:
function checkLegitimacy(input){
try {
let re = /(^https:\/\/docs\.google\.com\/)(.+)(\/d\/)(.{44})/
let matches = input.match(re)
let service = matches[2]
let id = matches[4]
let url = "https://docs.google.com/" + service + "/d/" + id
let response = UrlFetchApp.fetch(url, {'muteHttpExceptions': true})
if (response.getResponseCode() != 200 && response.getResponseCode() != 403){
throw "Doesn't exist"
} else {
Logger.log("Link OK")
}
} catch(e){Logger.log("Invalid Link")}
}
function main(){
checkLegitimacy("https://docs.google.com/spreadsheets/d/<REAL_ID>")
// Returns "Link OK"
checkLegitimacy("https://docs.google.com/spreadsheets/d/<FAKE_ID>")
// Returns "Invalid Link"
checkLegitimacy("https://badsite.com/spreadsheets/d/<REAL_ID>")
// Returns "Invalid Link"
}
References