2

I tried to use Web Scraper, but it only works for a few data entries not for hundreds of data points. Is there a way to scrape a large amount of data solely using Web Scraper or is there a better alternative like python? I intend to scrape information of the location name, address, rating number, and website. Thanks for any inputs! Screenshot of My Maps Sitemap:

{"_id":"mymaps","startUrl":["https://www.google.com/maps/d/u/0/edit?....."],"selectors":[{"id":"activityelement","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"div.i4ewOd-TaUzNb-haAclf","multiple":true,"delay":"1000","clickElementSelector":"div.un1lmc-pbTTYe-ibnC6b","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"},{"id":"activityname","type":"SelectorText","parentSelectors":["activityelement"],"selector":"div.i4ewOd-TaUzNb-r4nke","multiple":false,"regex":"","delay":0},{"id":"activityrating","type":"SelectorText","parentSelectors":["activityelement"],"selector":"span.fO2voc-jRmmHf-LJTIlf-wcwwM-H6j5tf","multiple":false,"regex":"","delay":0},{"id":"activityaddress","type":"SelectorText","parentSelectors":["activityelement"],"selector":".OzwZjf-jRmmHf-MZArnb-KDwhZb div:nth-of-type(3)","multiple":false,"regex":"","delay":0},{"id":"activitywebsite","type":"SelectorLink","parentSelectors":["activityelement"],"selector":"div:nth-of-type(4) a","multiple":false,"delay":0}]}
cgybb
  • 59
  • 2

3 Answers3

1

You can try Google Maps Local Results API by SerpApi. This is a paid API with a free plan that bypasses blocks and parses data on its backend.

In order to extract the data you need, you must fill in the search parameters such as:

  • 'q':'your query' is the business you want to retrieve information about;
  • 'll': 'coordinates' - This parameter specifies the GPS coordinates of the location where you want to apply your q (request). It must be built in the following sequence: @ + latitude + , + longitude + , + zoom. These coordinates are taken from the Google Maps URL.

Check full code in the online IDE.

from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import json

params = {
    'api_key': '...',                       # serpapi key, https://serpapi.com/manage-api-key
    'engine': 'google_maps',                # SerpApi search engine 
    'q': 'restourant',                      # query
    'll': '@30.2704424,-97.7876713,12.25z', # GPS coordinates: Austin, Texas, USA
    'type': 'search',                       # list of results for the query
    'hl': 'en',                             # language
    'start': 0,                             # pagination
}

search = GoogleSearch(params)               # where data extraction happens on the backend

data = []

# pagination
while True:
    results = search.get_dict()             # JSON -> Python dict

    for result in results.get("local_results", []):
        title = result.get("title")
        phone = result.get("phone")
        reviews = result.get("reviews")
        rating = result.get("rating")
        type = result.get("type")
        address = result.get("address")
        website = result.get("website")
        position= result.get("position")

        data.append({
          "position": position,
          "title": title,
          "phone": phone,
          "rating": rating,
          "reviews": reviews,
          "type": type,
          "address": address,
          "website": website
        })
        
    if 'next' in results.get('serpapi_pagination', {}):
        search.params_dict.update(dict(parse_qsl(urlsplit(results.get('serpapi_pagination', {}).get('next')).query)))
    else:
        break
    
print(json.dumps(data, indent=2, ensure_ascii=False))

Example output:

[
  {
    "position": 2,
    "title": "Corner Restaurant",
    "phone": "(512) 608-4488",
    "rating": 4.8,
    "reviews": 5110,
    "type": "Restaurant",
    "address": "110 E 2nd St, Austin, TX 78701",
    "website": "http://www.cornerrestaurantaustin.com/?scid=bb1a189a-fec3-4d19-a255-54ba596febe2&y_source=1_Mjg3MjEzMC03MTUtbG9jYXRpb24ud2Vic2l0ZQ%3D%3D"
  },
  {
    "position": 3,
    "title": "Kebabalicious",
    "phone": "(512) 466-6997",
    "rating": 4.6,
    "reviews": 1178,
    "type": "Restaurant",
    "address": "1311 E 7th St, Austin, TX 78702",
    "website": "http://www.kebabalicious.com/"
  },
]

You can check out the how to scrape Google Maps Local results with SerpApi blog post if you need more code explanation.

Disclaimer, I work for SerpApi.

Denis Skopa
  • 1
  • 1
  • 1
  • 7
  • I'm trying this method, to find a full list of board game cafes in the UK, but it seems to only return around 20 cafes at a time. Any idea how to get a full list for the UK? I feel like "local_results" is exactly what it says... unless there's a radius parameter? – Brian Westerman Apr 04 '23 at 07:57
  • 1
    Unfortunately SerpApi cannot search across city or country borders. To do this, you need to find the longitude and latitude of all points around a certain place (OpenStreetMaps may be suitable for this) and create an algorithm that will look for data inside this boundary without going beyond its border. SerpApi, in turn, offers a search for a specific point (coordinates) with pagination. – Denis Skopa Apr 26 '23 at 14:12
0

Have you tried Octoparse?

for hundreds of data points

It does allow 10'000s of input URLs...

Igor Savinkin
  • 5,669
  • 8
  • 37
  • 69
0

Try downloading the KML file from the three vertical dots at the top of the My Map, and use the 'Export as KML instead of KMZ' checkbox option...then you'll get all the geocoordinates in a human readable XML file.

enter image description here

enter image description here

PureJ
  • 115
  • 1
  • 6