2

So there are a lot of questions that have been asked around dynamic content scraping on stackoverflow, and I went through all of these, but all the solutions suggested did not work for the following problem:

Context:

Issue:

I have not been able to access any of the DOM elements on this page. Note if I could get some hints on how to access the search bar, and the search button, that would be a great start. See page to scrape What I want in the end, is to go through a list of addresses, launch the search, and copy the information displayed on the right hand side of the screen.

I have tried the following:

  • Changed the browser for webdriver (from Chrome to Firefox)
  • Added waiting time for the page to load

    try:
        WebDriverWait(self.driver, 10).until(EC.presence_of_element_located((By.ID, "addressInput")))
    except:
        print "address input not found"
    
  • Tried to access the item by ID, XPATH, NAME, TAG NAME, etc., nothing worked.

Questions

  • What else could I try that I have not so far (using Selenium webdriver)?
  • Are some websites really impossible to scrape? (I don't think that the city used an algorithm to generate any random DOM everytime I re-load the page).
Cœur
  • 37,241
  • 25
  • 195
  • 267
  • find the search field with one of the find_by_* methods, send Keys.ENTER – Corey Goldberg Mar 28 '16 at 23:35
  • 1
    The problem was that it could not find the elements... not about how to send keys. – Audrey Bascoul Mar 29 '16 at 02:06
  • your question had 2 parts: "if I could get some hints on how to access the search bar, *and* the search button"... I supplied the various methods to look for (`find_by_*`) to locate an element. (the accepted answer chose `find_element_by_id`). Also note, hitting enter to bypass an element lookup and simulated click tends to be faster and more reliable in practice. – Corey Goldberg Mar 29 '16 at 21:08

1 Answers1

2

You can use this url http://50.17.237.182/PIM/ to get the source:

In [73]: from selenium import webdriver


In [74]: dr = webdriver.PhantomJS()

In [75]: dr.get("http://50.17.237.182/PIM/")

In [76]: print(dr.find_element_by_id("addressInput"))
<selenium.webdriver.remote.webelement.WebElement object at 0x7f4d21c80950>

If you look at the source returned, there is a frame attribute with that src url:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>

<head>
  <title>San Francisco Property Information Map </title>
  <META name="description" content="Public access to useful property information and resources at the click of a mouse"><META name="keywords" content="san francisco, property, information, map, public, zoning, preservation, projects, permits, complaints, appeals">
</head>
<frameset rows="100%,*" border="0">
  <frame src="http://50.17.237.182/PIM" frameborder="0" />
  <frame frameborder="0" noresize />
</frameset>

<!-- pageok -->
<!-- 02 -->
<!-- -->
</html>

Thanks to @Alecxe, the simplest method it to use dr.switch_to.frame(0):

In [77]: dr = webdriver.PhantomJS()

In [78]: dr.get("http://propertymap.sfplanning.org/")

In [79]:  dr.switch_to.frame(0)  

In [80]: print(dr.find_element_by_id("addressInput"))
<selenium.webdriver.remote.webelement.WebElement object at 0x7f4d21c80190>

If you visit http://50.17.237.182/PIM/ in your browser, you will see exactly the same as propertymap.sfplanning.org/, the only difference is you have full access to the elements using the former.

If you want to input a value and click the search box, it is something like:

from selenium import webdriver


dr = webdriver.PhantomJS()
dr.get("http://propertymap.sfplanning.org/")

dr.switch_to.frame(0)

dr.find_element_by_id("addressInput").send_keys("whatever")
dr.find_element_by_xpath("//input[@title='Search button']").click()

But if you want to pull data, you may find querying using the url an easier option, you will get some json back from the query.

enter image description here

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321