0

I'm trying to build a web scraper that visits school district websites and retrieves the names and websites of the schools. I'm using https://www.dallasisd.org/ to test the code below.

I'm currently stuck on how to 1) only access the dropdown list of 'Schools' and 2) retrieve the links in the <li> tags in the same dropdown.

Any help would be much appreciated! Thank you.

from bs4 import BeautifulSoup
from selenium import webdriver
import urllib.request
import requests
import re
import xlwt
import pandas as pd
import xlrd
from xlutils.copy import copy
import os.path

hdr = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' }
browser = webdriver.Chrome()
url = 'https://www.dallasisd.org/'
browser.get(url)
html_source = browser.page_source
browser.quit()
soup = BeautifulSoup(html_source, "lxml")
for name_list in soup.find_all(class_ ='sw-dropdown-list'):
    print(name_list.text)
watwat7225
  • 13
  • 2

1 Answers1

0

The dropdown lists of elementary schools are contained in the <div id="cs-elementary-schools-panel" [...]> which you could access prior to finding all and obtain the links:

from bs4 import BeautifulSoup
import requests

headers = {
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}

url = 'https://www.dallasisd.org/'

req = requests.get(url, headers=headers)
soup = BeautifulSoup(req.content, 'html.parser')
dropdown = soup.find('div', attrs={'id': "cs-elementary-schools-panel"})

for link in dropdown.find_all('li', attrs={'class': "cs-panel-item"}):
   print("Url: https://www.dallasisd.org" + link.find('a')['href'])

You can easily extend this code to the Middle and High schools

user13055597
  • 47
  • 1
  • 6