2

I am aware that this can't be done with bash script only, or it isn't as far as I know (and I'm still learning). This is why I'm asking for help. What do I need more ? Are there specific tools ?

This is what I'd like to do:

  1. Upload an image to https://www.google.com/searchbyimage/upload
  2. Then find all the identical images
  3. Download the one which has the greatest resolution

So far I've been able to upload an image to Searchbyimage through curl. This uploaded image then creates a very long token that is used to search similar images, with some supplementary keywords.

The uploaded image creates a link composed like so:

https://www.google.com/search?tbs=sbi:

After this is the awfully long token: AMhZZith3JfR2OzwmuyQjufBifvdFWNjMShRMypWIE2-g005QfYLeTATLhGHAWz8MLI-tbgHzZp-bREPlJbsNWhY7U4Z2_19bu0oHII6VJPIVVJSPANODqnrJXp6X5VKKoXHMLcBCmI9eIpxS_1EX9g9YJPFL2XFEfJqIApLX83erP5mlRM7rSiIF5Te_1RPNyVkp4IPZPBRtoOKGhpDw2xad-JZsqd2ai4F5sMvyO2A_18PMFKg21nTRH_1jVeOeUhz8U5zkL4lycIg3kafAYlNy8YwmjSFcmc2nZB_10t9MFyi2BnBmemDRp4DCACI0FVM6pLTIB8VCBpU9A

And it adds this at the end: &hl=fr.

Finally the image is searched, and I have the choice between clicking "similar images" or "all sizes" (it's "all sizes" I want, as similar images doesn't ensure it will be identical). This will add some keywords from google's analysis of the picture (here, a photography of Émile Zola) and create a second token:

The picture I searched here

https://www.google.com/search?safe=strict&hl=fr&

q=emile+zola&tbm=isch

&tbs=simg:

CAQSmQEJthA57uIOXdcajQELEKjU2AQaBggXCD0IQgwLELCMpwgaYgpgCAMSKLQZ9QH3BLMZ2A6xGdcO3w70Ad0OwjrEOqEuwzqiLsE67iSTLoM4oC4aMIk1iw7XQn7Wu55hLB2k-bnfW3_1yf24eA0N-w-baKvWkDj48J67yZZS-uQ-BgjCRQyAEDAsQjq7-CBoKCggIARIEnfZWUgw&sa=X&ved=0ahUKEwi965ashtrhAhWI3eAKHSmRCBwQ2A4IKygB

&biw=1920&bih=944

With at the end the resolution of the picture. The idea is to recreate this second link, to then download the highest resolution image amongst what google has found. I have to get the token, but everything else can be found on the picture file itself: the file is properly named after the picture, and thus could make for keywords, and its resolution is also easily known. I'd like to make it a script, to download higher resolution images of many paintings - over a thousand - that are in low quality. Ideally I'd use it quite often. So far I had found how to upload a picture with curl, and it had gave me back a token, but uncomplete. Beyond this, I was completely lost.

In theory this doesn't seem impossible. The problem is I'm too much of a newbie: I enjoy a lot so far Linux and bash, but I only know so few. I have of course done some hours of googling before, nothing showed up that I knew I could use. There is nothing alike neither on github: a lot of scripts that search for similar images, but none for identical. None of them that also compares the sizes of these images. There's also a python API for reverse image searching, but it didn't seem like it could search for identical images, and it seems related to the google API, which is problematic. All of this is probably dumbly hard for me because I'm only a beginner, and I don't know enough to build this script: but in another way - maybe due to my lack of knowledge - it doesn't seem impossible at all, and I'm very willing to try, fail, try again: learn. So here I am, to ask: how do I do that ? Can it be done in bash only ? If not, what must I include ? Or perhaps it cannot be done ?

Lastly, I know there is a google API for reverse image searching. That'd be very useful, if it wasn't limited to a hundred image searches a day: if you want more, you've got to pay. And by a 100 images a day, it'd take me around eleven days to reverse search all the images I wanted in a better quality: in the end, I'd be done as fast by searching all that myself, by hand. But neither these options seems to be a solution: and this script doesn't seem impossible. It is only beyond my current capacities.

Thank you in advance, if anyone has got an idea !

PS: I can use linux wether through WSL, or a virtual machine. Both work very fine so far, including whatever command or package. WSL is much faster. And sorry for my english, I'm french !

Second PS: I've been asked to show what I had as code, but this doesn't get beyond this:

curl -i -F sch=sch -F encoded_image=@path/to/my/imagefile.jpg https://www.google.com/searchbyimage/upload

Which was a partial answer to my question I had found here: How to use google search by image in curl

  • 1
    Please show the code you have so far if you have got some parts working. – Mark Setchell Apr 18 '19 at 21:51
  • 1
    Generally speaking, a good SO question is about a very specific and narrow technical problem you encounter in the course of writing your software, with everything unrelated to that narrow issue factored out (as described in the [mcve] definition). "How do I accomplish some-larger-goal?"-style questions are thus typically eligible for close as too-broad. – Charles Duffy Apr 18 '19 at 22:40

1 Answers1

1

There's two fundamental ways to use the web programmatically:

  • via API: this is purpose built for computers to access web resources and always preferred. You follow strict rules and get well defined results back.
  • by crawling: this is when the computer pretends to be a user, emulating the clicking on links done in a browser. Basically curl, but over and over again with state stored in between, parameters generated correctly, encoding applied, etc.

As you say, there's an API available so if it does what you want then it's the right way to go. The fact that it does what you want, but enforces limits, is a very useful sign that was you're trying to do has limits. Those limits will have been carefully set to incentivise you to work within them. Trying to crawl for the same results will likely either breach Google's service term limits, or your sanity limits.

So if you really want to work around the API, then use a crawler library such as Python Scrapy. But note that the API limits might be a useful indication of how far you can expect to get without paying.

Heath Raftery
  • 3,643
  • 17
  • 34
  • Thank you very much for your answer ! As @CharlesDuffy pointed out, my question might be too large aswell. I guess I should consider using the API, and getting back with a more specific question if I encounter another problem (which then, if the API is well documented, shouldn't be the case). –  Apr 18 '19 at 22:53