3

I am trying to write some code for a personal project where i can scrape data from a site while also using that site's query box.

Furthermore, the website i am trying to use is https://www.latlong.net/convert-address-to-lat-long.html and I am trying to have a portion of my program where you input your address.

Then the request goes to the url's address search bar and perfoms the query, and then extracts the lat/lon elements from the site and stores it in a dataframe.

I know i will need to use beautifulsoup and, from what ive read, possibly mechanize and selenium but i got a a little lost trying to read up on mechanize.

baduker
  • 19,152
  • 9
  • 33
  • 56
m_ess4
  • 31
  • 2

1 Answers1

0

You might want to use the backend endpoint.

For example:

import pandas as pd
import requests
from urllib.parse import urlencode

search_query = "Berlin, Germany"

payload = {
    "c1": search_query,
    "action": "gpcm",
    "cp": "",
}

headers = {
    "content-type": "application/x-www-form-urlencoded",
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36",
    "referer": "https://www.latlong.net/convert-address-to-lat-long.html",
    "x-requested-with": "XMLHttpRequest",
    "cookie": "".join(
          f"{k}={v}" for k, v
          in requests.get("https://www.latlong.net").cookies.get_dict().items()
    ),
}

response = requests.post(
      "https://www.latlong.net/_spm4.php",
      data=urlencode(payload),
      headers=headers,
).text

df = pd.DataFrame(
      [[*search_query.split(", "), *response.split(",")]],
      columns=["City", "Country", "Latitude", "Longitude"],
)
print(df)

Output:

     City  Country   Latitude  Longitude
0  Berlin  Germany  52.520008  13.404954

PS. Don't overuse this, as they're going to throttle your requests. Or use a VPN to keep querying.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
baduker
  • 19,152
  • 9
  • 33
  • 56