5

To pause code until a web page is fully loaded, I've been using the method below with great success almost all of the time.

Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

But occasionally, I see text content load after the method determines that the page is fully loaded, and so that content is not extracted.

However, if I step through the code via F8, the content is extracted every time. This is done about as fast as I can press the F8 key repeatedly.

So how can I check to ensure the page, and all its content, is fully loaded before the code continues to extract data?

In both cases, IE is running invisibly. However, I've tried this with IE visible and there is actually content in this specific location on the pages I'm working with.

This is being done in Excel 2016, using VBA script. The specific content request is written like:

 'get item name from page and write it to the first cell on the first empty row available
 Set itemName = objIE.document.querySelector(".the-item-name")
 Worksheets("Results").Range("A1048576").End(xlUp).Offset(1, 0).Value = itemName.innerText

I've read through Excel VBA: Wait for JavaScript execution in Internet Explorer because I think that maybe the values are getting added after the document is loaded, in an effort to prevent anyone from scraping data. However, I can't seem to identify any script that may be doing that. Doesn't mean it isn't there. I just can't see it yet.

A specific example of the page with this issue is URL

https://www.homedepot.ca/en/home/p.dry-cloth-refills-32---count.1000660019.html

Initially product-total-price div element contains dash (-), prior to the price being loaded, so that's what the request will return: - / each instead of $11.29 / each.

I have a workaround, but it's not as efficient or as concise as I'd like it to be. I test the string returned for the presence of the dash. If it's there, loop and check it again, else capture it and insert it into the worksheet.

setPriceUM:
    Set hdPriceUM = objIE.document.querySelector(".product-total-price").innerTe‌​‌​xt
    hdPriceUMString = hdPriceUM.innerText
    stringTest = InStr(hdPriceUMString, "-")
    If stringTest = True Then
        GoTo setPriceUM
    Else
        Debug.Print hdPriceUMString
    End If

Thank you for taking the time to read this and consider it.

omegastripes
  • 12,351
  • 4
  • 45
  • 96
Innertube
  • 51
  • 1
  • 6
  • 1
    Excellent question. I struggle with this issue as well, but fortunately, a way I have always solved my problem was adding a loop after the initial loop that contained the `.Busy` - looping until my object becomes available. So, my question is: are all your objects available after the page completely "loads"? Example: You `Set SubmitBtn = doc.getElement.....`... Is that `Nothing`, prompting a runtime error? – K.Dᴀᴠɪs Nov 01 '17 at 22:52
  • I too have run into this problem, but it's usually a network problem. When the network is running fast, no problems, but when it's starts to show some lag time, the pages don't fully load. I've had to resort to adding a wait timer to allow extra time. `Application.Wait (Now + TimeValue("0:00:06"))` This being a 6 second wait. – Mitch Nov 02 '17 at 20:09
  • @Mitch Timers is unreliable solution, especially for slow network. – omegastripes Nov 02 '17 at 21:23
  • @omegastripes I totally agree. Our internal site is all done using kendo-ui, which, for me, is very hard to figure out. We have 4 T1's at our location, which you wouldn't think network lag would be an issue, but it happens occasionally. Most of the issue is with the multitude of users we have accessing the same system, and then the server communicates with the SQL server. I'm still wondering why we're not running everything on a Unix system (server side). Oh well, thanks for the comment ! I endeavor to persevere. – Mitch Nov 03 '17 at 11:24

1 Answers1

3

Functionality of webpages is very different, so there is no solution that will fit to all of them.

Regarding your example, your workaround is a working solution, the code might be like:

Sub TestIE()

    Dim q

    With CreateObject("InternetExplorer.Application")
        .Visible = True
        .Navigate "https://www.homedepot.ca/en/home/p.dry-cloth-refills-32---count.1000660019.html"
        ' Wait IE
        Do While .readyState < 4 Or .Busy
            DoEvents
        Loop
        ' Wait document
        Do While .document.readyState <> "complete"
            DoEvents
        Loop
        ' Wait element
        Do
            q = .document.querySelector(".product-total-price").innerText
            If Left(q, 1) <> "-" Then Exit Do
            DoEvents
        Loop
        .Quit
    End With
    Debug.Print q

End Sub

Anyway, you need to look into the webpage loading process, XHRs and DOM modifications, using browser developer tools (F12). Going that way, you may find that one of the numerous XHRs returns the price in JSON format. It's logged on network tab of browser developer tools right before the price appearing while the page is loading. That XHR is made by one of the loaded JS, notably after the page loaded event. Try this URL (I just copied it from network tab):

https://www.homedepot.ca/homedepotcacommercewebservices/v2/homedepotca/products/1000660019/localized/9999?catalogVersion=Online&lang=en

So you may just reproduce that XHR and extract the price by splitting:

Sub TestXHR()

    Dim q

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.homedepot.ca/homedepotcacommercewebservices/v2/homedepotca/products/1000660019/localized/9999?catalogVersion=Online&lang=en", False
        .Send
        q = .ResponseText
    End With
    q = Replace(q, " : ", ":")
    q = Split(q, """displayPrice""", 2)(1)
    q = Split(q, """formattedValue"":""", 2)(1)
    q = Split(q, """", 2)(0)
    Debug.Print q

End Sub

But again, there is no common case.

You may also use JSON parser, take a look at some examples.

omegastripes
  • 12,351
  • 4
  • 45
  • 96
  • Omegastripes, what you've done in the first code snippet is almost identical to my workaround, so that's encouraging to me that my coding quality is coming along. XHR is new-ish to me and I haven't done any JSON work since it first appeared about 15 years ago. I'll get up to speed on both of those. I just started looking into the MSXML2.XMLHTTP function last week. I believe you're right that this is the way to go. I'll give it a go at work and report back here. Thank you! – Innertube Nov 05 '17 at 16:49