1

I am trying to get a VBA webscraping tool to pull the data from Zillow for all sold properties within the last 24 months and paste it in an analytical format. The issues I am running into are:

1- the data is pulled as HTML format so it is stuffed into a single cell. What I have tried A-tried first to do a blanket convert to colluom function but the lengths are all variable and it will also sometimes cut off anything past the address. B- tried to copy and paste to a .txt file but the path kept breaking somewhere and not all the data made it.

2- clicking the "Next page" button. What I have tried A-searching href tags to create next url, the keep coming back as "Null" and break the chain B- using Css selector to pick the a tag with title "Next page". I think this is working, but it just keeps clicking the pages without downloading the data after page 1.

Bonus points if you can: -Also download the link href. See point 2 as I keep getting null whenever I try to get the href.

Make it so the base url pulls from a cell on sheet1 so I can make it more dynamic. Whenever I try it either tells me a specific range is not constant or the object is not valid.

Any help or clarity on what I am doing wrong would be helpful! Code:


Option Explicit

Sub GetZillowSold()

    Dim XMLReq As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    
    Dim ListCards As MSHTML.IHTMLElementCollection
    Dim InfoCard As MSHTML.IHTMLElement
    
    Dim SoldList As MSHTML.IHTMLElementCollection
    Dim SoldDate As MSHTML.IHTMLElement
    
    Dim Zpages As MSHTML.IHTMLElementCollection
    Dim Zpage As MSHTML.IHTMLElement
    
    Dim CardID As Integer
    
    XMLReq.Open "GET", "https://www.zillow.com/the-colony-tx/sold/house_type/3-_beds/", False
    XMLReq.send
    
    If XMLReq.Status <> 200 Then
        MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
        Exit Sub
     End If
     
     Worksheets.Add
     Range("A1").Value = "Address"
     Range("b1").Value = "Price"
     Range("c1").Value = "Bedroom"
     Range("d1").Value = "Bath"
     Range("e1").Value = "Sqft"
     Range("f1").Value = "Date"
     Range("A2").Select
     
     HTMLDoc.body.innerhtml = XMLReq.responseText
     Set XMLReq = Nothing
    
     
     
      Set ListCards = HTMLDoc.getElementsByClassName("list-card-info")
      
         'Debug.Print ListCards.Length
       
       For Each InfoCard In ListCards
         ActiveCell.Value = InfoCard.innerText
         ActiveCell.Offset(1, 0).Select
         
       Next InfoCard
       
       Range("f2").Select
       Set SoldList = HTMLDoc.getElementsByClassName("list-card-top")
      'Debug.Print SoldList.Length,
    
      For Each SoldDate In SoldList
         ActiveCell.Value = Mid(SoldDate.innerText, 6)
         ActiveCell.Offset(1, 0).Select
     Next SoldDate
     
    Set Zpages = HTMLDoc.getElementsByTagName("a")
     
     For Each Zpage In Zpages
        If (Zpage.getAttribute("title") = "Next page") Then
        Zpage.Click
        End If
        Exit For
        Next Zpage
End Sub


Warcupine
  • 4,460
  • 3
  • 15
  • 24
Punkmato
  • 11
  • 2
  • Thanks! First time Using this site and the other parts would not stay in the brackets – Punkmato Oct 30 '20 at 14:59
  • The page likely does a POST/GET to retrieve next set of results. Look for that in the network tab when navigating pages. Extract number of pages and loop issuing requests to get all results. A .click won't work with xmlhttp request as there are no active htmlevents/javascript. When clicking on the page you actually have a search query construct in the url which updates with page number. It is urlencoded version of: ` – QHarr Oct 30 '20 at 20:48
  • `https://www.zillow.com/the-colony-tx/sold/2_p/?searchQueryState={"pagination":{"currentPage":2},"mapBounds"{"west":-97.03619334228515,"east":-96.76634165771483,"south":33.00856360111178,"north":33.17751774912523},"regionSelection":[{"regionId":27398,"regionType":6}],"isMapVisible":true,"filterState":{"beds":{"min":3},"sort":{"value":"globalrelevanceex"},"fsba":{"value":false},"fsbo":{"value":false},"nc":{"value":false},"fore":{"value":false},"cmsn":{"value":false},"auc":{"value":false},"pmf":{"value":false},"pf":{"value":false},"rs":{"value":true},"ah":{"value":true}},"isListVisible":true,` – QHarr Oct 30 '20 at 20:49
  • `"mapZoom":12}` – QHarr Oct 30 '20 at 20:49
  • However, doesn't zillow offer an API? – QHarr Oct 30 '20 at 20:55
  • The web api extension we have doesn't pull the sale date. Thanks will try this out and let you know how it works. I have been using VBA for a while but webscraping is new for me – Punkmato Oct 30 '20 at 21:11

0 Answers0