-2

I want to build a webcrawler using R program for website "https://www.latlong.net/convert-address-to-lat-long.html", which can visit the website with the parameter for address and then fetch the generated latitude and longitude from the site. And this would repeat for the length of the dataset which I have.

Since I am new to web crawling domain, I would seek guidance.

Thanks in advance.

  • 1
    since it doesn't have an API your best bet is Rselenium – OganM Jun 07 '18 at 22:30
  • 1
    It will be really difficult without the webpage having a built in API. If you're interested in geolocating the addresses without this thought experiment, the ggmap package in R has the tools. See https://stackoverflow.com/questions/44290940/r-geocoding-with-address/44291289#44291289 – Ben Fasoli Jun 07 '18 at 22:31
  • the operation you're going for is called geocoding and it's usually not free, in my experience; at best you can hope for a throttled API that offers a limited number of free requests each day (e.g. Google). as such it's probably against the terms of service to use this website in the automated way you're hoping for. – MichaelChirico Jun 07 '18 at 22:45
  • although that website allows web scraping (see siteaddress/robots.txt), in the background it call https://maps.googleapis.com/maps/js/GeocodeService.Search which is not free – chinsoon12 Jun 07 '18 at 22:50

1 Answers1

0

In the past I have used an API called IP stack (ipstack.com).

Example: a data frame 'd' that contains a column of IP addresses called 'ipAddress'

for(i in 1:nrow(d)){
  #get data from API and save the text to variable 'str'
  lookupPath <- paste("http://api.ipstack.com/", d$ipAddress[i], "?access_key=INSERT YOUR API KEY HERE&format=1", sep = "")
  str <- readLines(lookupPath)

  #save all the data to a file
  f <- file(paste(i, ".txt", sep = ""))
  writeLines(str,f)
  close(f)

  #save data to main data frame 'd' as well:
  d$ipCountry[i]<-str[7]
  print(paste("Successfully saved ip #:", i))
}

In this example, I was specifically after the Country location of each IP, which appears on line 7 of the data returned by the API (hence the str[7])

This API lets you lookup 10,000 addresses per month for free, which was enough for my purposes.

KamRa
  • 349
  • 2
  • 12