4

I've recently encountered a serious memory leak issue with IE 11 that doesn't occur at all with Google Chrome. (It does occur with Firefox, but not to the same extent).

To observe it, follow these steps:

  1. Open an IE 11 window and a task manager window side-by-side. (It doesn't matter if you open IE in normal or no-extensions mode.)
  2. Go to IMDB.COM and open a movie page. Observe how much memory IE is consuming.
  3. Refresh the page 10 times and again observe the memory usage.

You'll see that IE is now consuming about 90 MEG more memory than when you first loaded the page. Repeating the same test with Google Chrome results in no significant additional memory consumption.

The behavior doesn't seem to occur with other websites -- just with IMDB.COM. Unfortunately, I have to scrape this site for a movie festival, and accessing too many pages always results in a crash. I've fashioned a workaround (closing and reopening the IE object every 50 pages), but I'd like to know if there's some way I can get IE to behave like Google Chrome at this site.

trevbet
  • 145
  • 1
  • 12
  • May we ask why you need to use IE at all? Why not just use Chrome, or better yet write a console program which scrapes IMDB for you. – Tim Biegeleisen May 17 '16 at 01:05
  • Give me a Chrome object that I can control from VBA, and I definitely will. I've been scraping IMDB smoothly and successfully using VBA from within MS Access for several years now. I really like having the IE object, regex, and database technology under one roof, and being able to perform intermediate operations on items I scrape before saving them in a table. – trevbet May 17 '16 at 01:35

2 Answers2

1

You want Chrome, you can have Chrome! Use selenium basic and let's see what is on at the movies...

Option Explicit
Public Sub GetMovies()
    Dim d As WebDriver, html As HTMLDocument
    Set d = New ChromeDriver
    Const url = "https://www.imdb.com/"
    Application.ScreenUpdating = False
    With d
        .AddArgument "--headless"
        .Start "Chrome"
        .get url
        Set html = New HTMLDocument
        html.body.innerHTML = .PageSource '<== Just to show you how to transfer HTML
        Dim links As Object, i As Long
        Set links = html.querySelectorAll(".title a")
        For i = 0 To links.Length - 1
            Debug.Print links(i).innerText
        Next i
        .Quit
        Application.ScreenUpdating = True
    End With
End Sub

For large numbers of requests though you will want XHR as faster then load the HTML from that into an HTML Document:

Public Function GetSource(ByVal sURL As String) As HTMLDocument
    Dim oXHTTP As Object, HTML As New HTMLDocument,sResponse As String
    Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
    oXHTTP.Open "GET", sURL, False
    oXHTTP.send
    sResponse = StrConv(oXHTTP.responseBody, vbUnicode)
    sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
    HTML.body.innerHTML = sResponse
    Set GetSource = HTML 
End Function
QHarr
  • 83,427
  • 12
  • 54
  • 101
0

I've surmounted this problem by using an MSXML2.XMLHTTP object instead of an IE object for acquiring HTML pages to parse with Regex. If there's an even superior way of downloading HTML to VBA, please cite it.

trevbet
  • 145
  • 1
  • 12