I've been playing around with making a draftjs plugin that lets the user paste in mixed text&image content from websites and have images auto-uploaded to the server. I've quickly come to the realization that it's not easy, simply because of how many different sites use different kinds of counter-measures for copy/pasting images. Standard image tags in page content are no problem - easily grab the src and handle the file upload from the url. However, many sites use all kinds of trickery to make this a pain. For example, some will only serve small thumbnails, requiring a GET request on the image with a hash key in order to retrieve a larger version. Others somehow seem to corrupt the image so that it's unreadable by the time it's been retrieved. Others still play with weird embed tags to mess with draftjs' image blocks.
But then I open up a Google Docs file, and find that when I copy any images into that from a website, there's never any troubles whatsoever. All the problematic websites that I'm finding myself having to write specific methods for retrieving from seem to be handled by Google Docs with ease.
Am I using completely the wrong approach by trying to retrieve images from a url? Does Google use a far superior approach (yes, I presume) - in which case, does anyone have any idea what that approach might be?