0

So, I've trying to scrape data from a website but I simply can't reach my goal...
I'm new with VBA and i've tried to search the basics of vba in order to understand some code.
So far I got this code but it's only scraping the data from the 1st div and it scrap all the data to one cell, and I need the macro to run trought all the page and scrap all the data that has the className I input on the code on diferent cells (eg: 1st div to cell A:1, 2nd div to cell A2... and so on)
Could you help me or give me some "lights" of what I'm doing wrong pls? Thank you!
Code:

Sub BoschRoupa()

    Dim ieObj As InternetExplorer
    Dim htmlEle As IHTMLElement
    Dim i As Integer
   

    i = 1
   

    Set ieObj = New InternetExplorer
    ieObj.Visible = False
    ieObj.navigate "https://www.worten.pt/grandes-eletrodomesticos/maquinas-de-roupa/maquinas-de-roupa-ver-todos-marca-BALAY-e-BOSCH-e-SIEMENS?per_page=100"
   

    Application.Wait Now + TimeValue("00:00:05")
   

    For Each htmlEle In ieObj.document.getElementsByClassName("w-product__content")(0).getElementsByTagName("div")
        With ActiveSheet
            .Range("A" & i).Value = htmlEle.Children(0).textContent
        End With
       
        i = i + 1
    Next htmlEle
   
End Sub
General Grievance
  • 4,555
  • 31
  • 31
  • 45

1 Answers1

0

You can use xmlhttp, rather than a browser, then the following loop to write out all the div info. I would probably be more selective in how I grab only data of interest but the following, I hope, is in the spirit of what you have asked for.

Option Explicit
Public Sub GetInfo()
    Dim data As Object, i As Long, html As HTMLDocument, r As Long, c As Long, item As Object, div As Object
    Set html = New HTMLDocument                  '<== VBE > Tools > References > Microsoft HTML Object Library
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.worten.pt/grandes-eletrodomesticos/maquinas-de-roupa/maquinas-de-roupa-ver-todos-marca-BALAY-e-BOSCH-e-SIEMENS?per_page=100", False
        .send
        html.body.innerHTML = .responseText
    End With
    Set data = html.getElementsByClassName("w-product__content")
    For Each item In data
        r = r + 1: c = 1
        For Each div In item.getElementsByTagName("div")
            With ThisWorkbook.Worksheets("Sheet1")
                .Cells(r, c) = div.innerText
            End With
            c = c + 1
        Next
    Next
End Sub
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Yes it was exactly like that !! The only "issue" it's that I would prefer to save the data on vertical mode but it's working like that and that's what I wanted, thanks a lot !!! –  Mar 01 '19 at 13:15
  • Sorry to bother you again but, in case I need to run it from several pages (let's say that this link has a pagination script that limits only 48 records but in total it has like 200) –  Mar 01 '19 at 13:43
  • Then use a last row function [1](https://stackoverflow.com/a/54055647/6241235) [2](https://stackoverflow.com/a/53556545/6241235) to determine the value of r (row number) for writing out to sheet. You find the last populated row and add 1 to give value of r to continue writing out. r then must not be reset to 1 within any loop. Or simply ensure that r keeps being incremented so you write out to each new row. – QHarr Mar 01 '19 at 13:52
  • Are you asking how to retrieve data for other pages or , if you can loop all pages, how to write out to sheet? – QHarr Mar 01 '19 at 14:44
  • So, what I pretend is to do the exact same thing that I'm doing with the code that u help me. but instead of only going to that exact page go to all the pages that are avariable like `.Open "GET", "https://www.worten.pt/grandes-eletrodomesticos/maquinas-de-roupa/maquinas-de-roupa-ver-todos-marca-BALAY-e-BOSCH-e-SIEMENS?per_page=100&page=2"` <- And here I need it to run trought page 3,4,5,6... etc, is that possible? :s –  Mar 01 '19 at 14:51
  • Calculate the total number of results if possible. Then ascertain how many items you can get per page in the per_page argument of the url (e.g. per_page=100) . Based off that work out how many pages you are likely to need and yes, construct a loop which generates the appropriate urls to get all the results. If you get stuck post a new question. – QHarr Mar 01 '19 at 14:53