0

I'm trying to programatically (in Python) retrieve account information from this website for a list of properties I have (identified by BRT number).

This should be very simple, and I've read a few things I've found via Google, but it's all way over my head as I've no web development experience so all the vernacular is in-one-ear-out-the-other.

The procedure should be very simple, as the web page seems very no-frills:

  1. Set brt, e.g. 883309000.

  2. Open the url: http://www.phila.gov/revenue/RealEstateTax/default.aspx.

  3. Select the by BRT Number field and enter brt.

  4. Click the >> button to retrieve property info.

  5. Scrape the bottom line (TOTALS) and the accurate-to date, in this case:

    TOTALS $13,359.83 $2,539.14 $1,417.73 $1,645.59 $18,962.29

and

06/30/2015

I'm principally stuck on steps 3 and 4. I've gotten as far as:

import mechanize
from bs4 import BeautifulSoup

br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36')]
br.open('http://www.phila.gov/revenue/RealEstateTax/default.aspx')

soup = BeautifulSoup(br.response().read())

#Here's the BRT Number field
soup.find("input",{"id":"ctl00_BodyContentPlaceHolder_SearchByBRTControl_txtTaxInfo"})

#Here's the "Lookup by BRT" button
soup.find("input",{"id":"ctl00_BodyContentPlaceHolder_SearchByBRTControl_btnTaxByBRT"})

But I am really lost on what to do from there. Any help would be appreciated.

Community
  • 1
  • 1
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198

1 Answers1

1

Have you considered using the selenium package for python. The documentation for this is here, I strongly suggest you read this through, run a few basic tests to check your understanding and skim it through again before starting.

The point of Selenium is to load the page as you would in your browser and perform commands (which you can automate using python code).

First import selenim:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

Then begin the webdriver and load the page, 'assert' will check that the page has "Revenue Department" in the title before proceeding.

driver = webdriver.Firefox()
driver.get("http://www.phila.gov/revenue/RealEstateTax/default.aspx")
assert "Revenue Department" in driver.title

Following this we need to select the BRT input box and send keys brt

driver.find_element_by_id("ctl00_BodyContentPlaceHolder_SearchByBRTControl_txtTaxInfo").send_keys(brt)

Finally we need to push the >> button

driver.find_element_by_id("ctl00_BodyContentPlaceHolder_SearchByBRTControl_btnTaxByBRT").click()

Now you should be taken to the page of results

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Parsa
  • 3,054
  • 3
  • 19
  • 35
  • This worked beautifully! I'm just worried now that it'll be slow since I need to do this 24,000 times – MichaelChirico Jul 03 '15 at 22:01
  • 1
    It might take a while. Maybe run the script during offpeak server times (overnight) and perhaps put a random delay of a few seconds after each iteration? Good luck. – Parsa Jul 03 '15 at 22:03