0

I have a webpage to web scrape which presents products with cheaper price first availability and seller name.

I need to get the first seller who has the product "Available" in a set of sellers

The list of sellers are presented in the following way

<li class="card js-product-card"
  data-shopid="336"
  data-shop-review-score="4.5"
  data-shop-reviews-count="265">

<div class="shop cf">
  <a title="Supplier1" class="shop-logo js-shop-logo" href="/m/336/Supplier1"></a>
</div>

<div class="description">
    <div class="item js-product" data-product-id="35583015">
      <h3>
        <a title="Τροφοδοτικό (PSU) Thermaltake Smart RGB 700W" </a>
      </h3>
      <p class="availability">
        <span class="availability instock">Not Available</span>
      </p>
    </div>
</div>
<div class="price" >
  <div class="js-blp pre-blp content-placeholder">
    <a title="Δες το στο κατάστημα" rel="nofollow" class="product-link js-product-link content-placeholder" href="/products/show/35289523">63,00€</a>
  </div>

</div>
</li>

<li class="card js-product-card"
  data-shopid="336"
  data-shop-review-score="4.5"
  data-shop-reviews-count="265">

<div class="shop cf">
  <a title="Supplier1" class="shop-logo js-shop-logo" href="/m/336/Supplier1"></a>
</div>

<div class="description">
    <div class="item js-product" data-product-id="35583015">
      <h3>
        <a title="Τροφοδοτικό (PSU) Thermaltake Smart RGB 700W" </a>
      </h3>
      <p class="availability">
        <span class="availability instock">Available</span>
      </p>
    </div>
</div>

<div class="price" >
  <div class="js-blp pre-blp content-placeholder">
    <a title="Δες το στο κατάστημα" rel="nofollow" class="product-link js-product-link content-placeholder" href="/products/show/35289523">68,00€</a>
  </div>
</div>
</li>

<li class="card js-product-card"
  data-shopid="338"
  data-shop-review-score="3"
  data-shop-reviews-count="135">

<div class="shop cf">
  <a title="Supplier3" class="shop-logo js-shop-logo" href="/m/336/Supplier3"></a>
</div>

<div class="description">
    <div class="item js-product" data-product-id="37583015">
      <h3>
        <a title="Τροφοδοτικό (PSU) Thermaltake Smart RGB 700W" </a>
      </h3>
      <p class="availability">
        <span class="availability instock">Available in 1-3 Days</span>
      </p>
    </div>
</div>
<div class="price" >
  <div class="js-blp pre-blp content-placeholder">
    <a title="Δες το στο κατάστημα" rel="nofollow" class="product-link js-product-link content-placeholder" href="/products/show/35289523">69,00€</a>
  </div>
</div>
</li>

I need to get the Title of < a > tag which is the seller's name and the text Available together with the price

Community
  • 1
  • 1

2 Answers2

0

I went to that URL and it was Greek to me! Sorry. Couldn't resist. But even after Google translate, the word "available" still didn't appear anywhere.

But the first comment on your question is the fair answer - basically asking for the code to be written from scratch is not the kind of question this service is intended for.

In general terms, Excel is able to "web-scrape", and it is able to search for strings. If you have never done either (you didn't say), then it's important to start by doing it manually at least once before you start doing it programmatically with VBA.

There are, obviously, not just instructions but YouTube Videos on how to web-scrape with Excel, if you need a primer: https://www.youtube.com/watch?v=YQbe20G0ZSM

If you're starting from scratch, though, I have to question using Excel at all; I would mention Perl for web-client work, because it has a wealth of tools and boilerplate code for it, whole O'Reilly books on it - and because I'm an Old Person (tm). I'm certain you can do all the same these days with Python. Both are hugely popular for web-scraping ; Excel, not so much.

That said, Excel has in common with Perl that there's almost too many tools and More Than One Way To Do It.

This answer on StackExchange indicated how to grab a web page with an "internet explorer object": Webscraping using VBA excel

whereas this one counseled the use of an "HTMLDocument" object: Scraping the source code using VBA-Macros

...which I'd say is much more recent technique. And I believe there are even other code libraries to tap, something called "MSXML2.ServerXMLHTTP.6.0" .

Once you've picked a way to grab text strings from the web at all, you have to search them. Which brings us to whether you want to learn about "Regular Expressions" or "Regexp" techniques. They have a learning curve. If not, Excel has basic string search functions that will still find the strings you want, with a little more VBA code.

So we'll leave it there until you can outline your problem more fully so that you aren't asking for answerers to just write all the code for you.

Roy Brander
  • 121
  • 6
0

I would gather all the products and loop them until you find a product that has the compound class name availability instock. Extract the associated product id and use that to target elements for the same id to gather the other info.


XMLHTTP request version:

Option Explicit
Public Sub GetInfo()
    Dim i As Long, sResponse As String, html As HTMLDocument

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.skroutz.gr/s/12685474/Thermaltake-Smart-RGB-700W.html", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = .responseText
    End With
    Stop

    Set html = New HTMLDocument
    html.body.innerHTML = sResponse

    Dim products As Object, product As Object
    Set products = html.getElementsByClassName("card js-product-card")
    Dim id As String
    For Each product In products
        If InStr(product.outerHTML, "availability instock") > 0 Then
            id = product.getAttribute("data-product-id")
            'You have the id now to use to select as everything else. Use this later
            Exit For
        End If
    Next
    'other code using product id to target for values
    If id <> vbNullString Then
        On Error Resume Next
        Debug.Print html.querySelector("[data-uservoice-pid='" & id & "']").title
        Debug.Print html.querySelector("[data-product-id='" & id & "'] p.availability").innerText
        Debug.Print html.querySelector("[data-product-id='" & id & "'] .price").innerText
        On Error GoTo 0

    End If
End Sub

Browser version:

Option Explicit

Public Sub GetInfo()
    Dim ie As New InternetExplorer, i As Long
    With ie
        .Visible = True
        .Navigate2 "https://www.skroutz.gr/s/12685474/Thermaltake-Smart-RGB-700W.html"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim products As Object, product As Object
        Set products = .document.getElementsByClassName("card js-product-card")
        Dim id As String
        For Each product In products
            If product.getElementsByClassName("availability instock").Length > 0 Then
                id = product.getAttribute("data-product-id")
                'You have the id now to use to select as everything else. Use this later
                Exit For
            End If
        Next
        'other code using product id to target for values
        If id <> vbNullString Then
            On Error Resume Next
            Debug.Print .document.querySelector("[data-uservoice-pid='" & id & "']").title
            Debug.Print .document.querySelector("[data-product-id='" & id & "'] p.availability").innerText
            Debug.Print .document.querySelector("[data-product-id='" & id & "'] .price").innerText
            On Error GoTo 0

        End If
        .Quit
    End With
End Sub
QHarr
  • 83,427
  • 12
  • 54
  • 101