Copying images in from anti-scraper websites. Google Docs handles it easily - anyone know how?

Question

I've been playing around with making a draftjs plugin that lets the user paste in mixed text&image content from websites and have images auto-uploaded to the server. I've quickly come to the realization that it's not easy, simply because of how many different sites use different kinds of counter-measures for copy/pasting images. Standard image tags in page content are no problem - easily grab the src and handle the file upload from the url. However, many sites use all kinds of trickery to make this a pain. For example, some will only serve small thumbnails, requiring a GET request on the image with a hash key in order to retrieve a larger version. Others somehow seem to corrupt the image so that it's unreadable by the time it's been retrieved. Others still play with weird embed tags to mess with draftjs' image blocks.

But then I open up a Google Docs file, and find that when I copy any images into that from a website, there's never any troubles whatsoever. All the problematic websites that I'm finding myself having to write specific methods for retrieving from seem to be handled by Google Docs with ease.

Am I using completely the wrong approach by trying to retrieve images from a url? Does Google use a far superior approach (yes, I presume) - in which case, does anyone have any idea what that approach might be?

Are you trying to retrieve the URL from server-side (your back-end), or client-side (browser of the user pasting it)? I have found that most successful solutions use the latter. — , Mar 27 '18 at 21:26
The current approach passes the url to be handled server-side. I think you're right and doing more on the client-side is the way forward - need to look in to how image data is handled and how to transfer it from the client-side — Cerzi, Mar 27 '18 at 21:41
I would suggest first looking at how image data can be pasted directly from the clipboard (many sites now support this). I don't have a readily-available solution, but one could ideally transform a GET-retrieved URL, grab the image data from that, and handle it in the same process. Two birds with one stone. — , Mar 27 '18 at 21:44

Copying images in from anti-scraper websites. Google Docs handles it easily - anyone know how?

0 Answers0