I'm trying to make a web scraper for Zillow, and I have successfully found a way to obtain raw info from a Zillow search page. However, I am fairly new to python and very new to web scraping and have no idea how to extract necessary info from this. How should I go about doing this?
Portion of raw info here (from query Houston, Texas) https://paste.ee/p/QyJJG
Current Code (just scrapes info from desired area and spits it out into console)
from bs4 import BeautifulSoup as soup
import numpy as np
import pandas as pd
import requests
import random
headers = {
'authority': 'www.zillow.com',
'accept': '*/*',
'accept-language': 'en-US,en;q=0.9',
'sec-ch-ua': '"Not_A Brand";v="99", "Google Chrome";v="109", "Chromium";v="109"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36' + str(random.randint(1, 1000)), #randomint bypasses captcha
}
with requests.session() as i:
#user input
city = str(input("City To Search In: ")) + "/"
#initializers
page = 1
end_page = 10
url = ""
url_list = []
request = ""
request_list = []
while page <= end_page:
url = "https://www.zillow.com/homes/for_sale/" + city + str(page) + "_p"
url_list.append(url)
page += 1
for j in url_list: #can change j to "url" if you want but i like "j" for simplicity
request = i.get(j, headers=headers)
request_list.append(request)
#another two initializers (seperating this felt more organized to me, sorry if it looks messy)
rawInfo = ""
rawInfoList = []
for request in request_list:
rawInfo = soup(request.content, "html.parser")
rawInfoList.append(rawInfo)
print(rawInfoList)
I tried using find() and findall() but I was pretty confused because it always showed no results. I need the listing address, rent zestimate, link, and zestimate/price of various listings in an area. All of the info needed is in the raw info paste, I just need a way to take it all out and make it more readable. I'm trying to make something that takes all of these different pieces of information then adds them to a separate list for each type of info (price, rentZestimate, etc)