Selenium webdriver with python to scrape dynamic page cannot find element

Question

So there are a lot of questions that have been asked around dynamic content scraping on stackoverflow, and I went through all of these, but all the solutions suggested did not work for the following problem:

Context:

Using Selenium webdriver with python
I mostly used this resource: http://selenium-python.readthedocs.org/page-objects.html regarding the Python.org example.
Page to scrape: http://propertymap.sfplanning.org/

Issue:

I have not been able to access any of the DOM elements on this page. Note if I could get some hints on how to access the search bar, and the search button, that would be a great start. See page to scrape What I want in the end, is to go through a list of addresses, launch the search, and copy the information displayed on the right hand side of the screen.

I have tried the following:

Changed the browser for webdriver (from Chrome to Firefox)

Added waiting time for the page to load

try:
    WebDriverWait(self.driver, 10).until(EC.presence_of_element_located((By.ID, "addressInput")))
except:
    print "address input not found"

Tried to access the item by ID, XPATH, NAME, TAG NAME, etc., nothing worked.

Questions

What else could I try that I have not so far (using Selenium webdriver)?
Are some websites really impossible to scrape? (I don't think that the city used an algorithm to generate any random DOM everytime I re-load the page).

find the search field with one of the find_by_* methods, send Keys.ENTER — Corey Goldberg, Mar 28 '16 at 23:35
The problem was that it could not find the elements... not about how to send keys. — Audrey Bascoul, Mar 29 '16 at 02:06
your question had 2 parts: "if I could get some hints on how to access the search bar, *and* the search button"... I supplied the various methods to look for (`find_by_*`) to locate an element. (the accepted answer chose `find_element_by_id`). Also note, hitting enter to bypass an element lookup and simulated click tends to be faster and more reliable in practice. — Corey Goldberg, Mar 29 '16 at 21:08

Padraic Cunningham · Accepted Answer · 2016-03-29T01:50:25.587

You can use this url http://50.17.237.182/PIM/ to get the source:

In [73]: from selenium import webdriver


In [74]: dr = webdriver.PhantomJS()

In [75]: dr.get("http://50.17.237.182/PIM/")

In [76]: print(dr.find_element_by_id("addressInput"))
<selenium.webdriver.remote.webelement.WebElement object at 0x7f4d21c80950>

If you look at the source returned, there is a frame attribute with that src url:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>

<head>
  <title>San Francisco Property Information Map </title>
  <META name="description" content="Public access to useful property information and resources at the click of a mouse"><META name="keywords" content="san francisco, property, information, map, public, zoning, preservation, projects, permits, complaints, appeals">
</head>
<frameset rows="100%,*" border="0">
  <frame src="http://50.17.237.182/PIM" frameborder="0" />
  <frame frameborder="0" noresize />
</frameset>

<!-- pageok -->
<!-- 02 -->
<!-- -->
</html>

Thanks to @Alecxe, the simplest method it to use dr.switch_to.frame(0):

In [77]: dr = webdriver.PhantomJS()

In [78]: dr.get("http://propertymap.sfplanning.org/")

In [79]:  dr.switch_to.frame(0)  

In [80]: print(dr.find_element_by_id("addressInput"))
<selenium.webdriver.remote.webelement.WebElement object at 0x7f4d21c80190>

If you visit http://50.17.237.182/PIM/ in your browser, you will see exactly the same as propertymap.sfplanning.org/, the only difference is you have full access to the elements using the former.

If you want to input a value and click the search box, it is something like:

from selenium import webdriver


dr = webdriver.PhantomJS()
dr.get("http://propertymap.sfplanning.org/")

dr.switch_to.frame(0)

dr.find_element_by_id("addressInput").send_keys("whatever")
dr.find_element_by_xpath("//input[@title='Search button']").click()

But if you want to pull data, you may find querying using the url an easier option, you will get some json back from the query.

I think you just need to switch to the `iframe`: `driver.switch_to.frame(0)`, right? — alecxe, Mar 29 '16 at 01:33
@PadraicCunningham: thank you so much, this is beautiful- I am mad at myself because I never thought about looking at that. I've spent 2 full days on that... — Audrey Bascoul, Mar 29 '16 at 02:05
@AudreyBascoul, you're welcome, the image is not the best at the end but if you open firebig or chrome dev tools and monitor the network requests you will get a clearer picture of what is happening. — Padraic Cunningham, Mar 29 '16 at 02:08
@PadraicCunningham thanks! Sometimes I use Fiddler. Good note on using the url to send the query :) — Audrey Bascoul, Mar 29 '16 at 02:14

Selenium webdriver with python to scrape dynamic page cannot find element

Context:

Issue:

1 Answers1