Relatively new to beautifulsoup and I'm trying to extract data from this webpage: http://reports.workforce.test.ohio.gov/program-county-wia-reports.aspx?name=GTL8gAmmdulY5GSlycy7WQ==&dataType=hIp9ibmBIwbKor1WvT5Bkg==&dataTypeText=hIp9ibmBIwbKor1WvT5Bkg==#
I would like to grab the numbers under the headings "Program Completers", "Employed Second Quarter", etc. A relevant part of the html code is:
<ul class="listbox">
<li class="li1">
<p style="cursor:help" class="listtop" title="WIA Adult
completers are those individuals who have exited a WIA Adult program from
which the individual received a core staff-assisted service (such as job
search or placement assistance) or an intensive service (such as
counseling, career planning, or job training). Those individuals who
participated in WIA through self-service, like OhioMeansJobs.com, or other
less intensive programs are not included in the dashboard.">Program
Completers</p>
<p id="programcompleters1">18</p></li>
I want the string "Program Completers" and "18". I have tried implementing these solutions here, here, and here but without much luck. One version of my code is:
from bs4 import BeautifulSoup
import urllib2
url="http://reports.workforce.test.ohio.gov/program-county-wia-reports.aspx?name=GTL8gAmmdulY5GSlycy7WQ==&dataType=hIp9ibmBIwbKor1WvT5Bkg==&dataTypeText=hIp9ibmBIwbKor1WvT5Bkg=="
hdr = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'}
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
for tag in soup.find_all('ul'):
print tag.text, tag.next_sibling
This returns text but from other parts of the webpage also tagged 'ul'. I have been unsuccessful in grabbing any text from inside the chart area. How can I retrieve the text I want?
Thank you for any help!