0

I am using Google Custom Search API v1, has the problem of 100 limit results been solved, I have read mnay similar but old Questions regarding this How to get more than 100 results from Google Custom Search API and Google community. This is quick testing tool on Google cse search.

Is there a way to be "ethical" to scrape Google Images results without this limit.

programmerwiz32
  • 529
  • 1
  • 5
  • 20

2 Answers2

0

100 results is still the maximum for the Custom Search API

If you want 'deeper' results, consider adjusting the query to be more specific

Andy
  • 507
  • 3
  • 5
0

As an alternative to Custom Search API, there's a Google Images API from SerpApi(paid API with a free plan that handles blocks and parsing on their backend).

The result of the script below will be to display all possible links to images in the original resolution.

Check code in the online IDE.

from serpapi import GoogleSearch
import os, json

image_results = []
   
# search query parameters
params = {
    "engine": "google",               # search engine. Google, Bing, Yahoo, Naver, Baidu...
    "q": "cat",                       # search query for example
    "tbm": "isch",                    # image results
    "num": "100",                     # number of images per page
    "ijn": 0,                         # page number: 0 -> first page, 1 -> second...
    "api_key": "..."                  # serpapi key from https://serpapi.com/manage-api-key
                                      # other query parameters: hl (lang), gl (country), etc  
}
    
search = GoogleSearch(params)         # where data extraction happens
    
images_is_present = True
while images_is_present:
    results = search.get_dict()       # JSON -> Python dictionary
    
# checks for "Google hasn't returned any results for this query."
    if "error" not in results:
        for image in results["images_results"]:
            if image["original"] not in image_results:
                    image_results.append(image["original"])
                
# update to the next page
        params["ijn"] += 1
    else:
        print(results["error"])
        images_is_present = False

print(json.dumps(image_results, indent=2))

Output:

[
  "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a0/Bombay_Katze_of_Blue_Sinfonie_1_%282010_photo%3B_cropped_2022%29.JPG/1200px-Bombay_Katze_of_Blue_Sinfonie_1_%282010_photo%3B_cropped_2022%29.JPG",
  "https://i.natgeofe.com/n/3861de2a-04e6-45fd-aec8-02e7809f9d4e/02-cat-training-NationalGeographic_1484324_square.jpg",
  "https://cdn.theatlantic.com/thumbor/skBoq_Qs4AAmA8Nrn-9ChphmPFk=/0x176:1387x1910/648x810/media/img/2022/11/21/Cats_new_1/original.jpg",
  "https://www.cdc.gov/healthypets/images/pets/woman-with-cat-asleep-medium.jpg?_=89350",
  "https://www.americanhumane.org/app/uploads/2016/08/animals-cats-cute-45170-min.jpg",
  "https://m.economictimes.com/thumb/height-450,width-600,imgsize-34182,msid-93429238/international-cat-day-2022-all-you-need-to-know-about-date-significance-history.jpg",
  "https://cdn.petcarerx.com/cdn-cgi/image/fit=pad,width=1200cdn.petcarerx.com/LPPE/images/articlethumbs/Cat-Breed-Lifespan-Large.jpg",

  other results ...
]

You can test the work of the API, get acquainted with the query parameters and output results in Google Search playground.

Disclaimer, I work for SerpApi.

Denis Skopa
  • 1
  • 1
  • 1
  • 7