0

My goal is to scrape the source code of a web page.

The site seems to have different Frames which is why my code won't work properly.

I tried to modify a code which I found online which should solve the Frame issue.

The following code creates an error (object required) at:

Set profileFrame .document.getElementById("profileFrame")

Public Sub IE_Automation()

 'Needs references to Microsoft Internet Controls and Microsoft HTML Object Library

Dim baseURL As String
Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim profileFrame As HTMLIFrame
Dim slotsDiv As HTMLDivElement

'example URL with multiple frames
baseURL = "https://www.xing.com/search/members?section=members&keywords=IT&filters%5Bcontact_level%5D=non_contact"

Set IE = New InternetExplorer
With IE
    .Visible = True

     'Navigate to the main page

    .navigate baseURL & "/publictrophy/index.htm?onlinename=ace_anubis"
    While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend

     'Get the profileFrame iframe and navigate to it

    Set profileFrame = .document.getElementById("profileFrame")

    .navigate baseURL & profileFrame.src
    While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend

    Set HTMLdoc = .document
End With

 'Display all the text in the profileFrame iframe

MsgBox HTMLdoc.body.innerText

'Display just the text in the slots_container div

Set slotsDiv = HTMLdoc.getElementById("slots_container")
MsgBox slotsDiv.innerText

End Sub
Community
  • 1
  • 1
Andreas
  • 1
  • 1

2 Answers2

0

Hummmm, I'm not exactly sure what you are doing here, but can you try the code below?

Option Explicit

Sub Sample()
    Dim ie As Object
    Dim links As Variant, lnk As Variant
    Dim rowcount As Long

    Set ie = CreateObject("InternetExplorer.Application")
    ie.Visible = True
    ie.navigate "https://www.xing.com/search/members?section=members&keywords=IT&filters%5Bcontact_level%5D=non_contact"

    'Wait for site to fully load
    'ie.Navigate2 URL
    Do While ie.Busy = True
       DoEvents
    Loop

    Set links = ie.document.getElementsByTagName("a")

    rowcount = 1

    With Sheets("Sheet1")
        For Each lnk In links
        'Debug.Print lnk.innerText
            'If lnk.classname Like "*Real Statistics Examples Part 1*" Then
                .Range("A" & rowcount) = lnk.innerText
                rowcount = rowcount + 1
                'Exit For
            'End If
        Next
    End With
End Sub
ASH
  • 20,759
  • 19
  • 87
  • 200
  • Hi ryguy72 and thank you very much! Unfortunately I get an error "our of range". I made a work around for the issue and I now use imacros chrome plugin to scrap the content and then import it via VBA into excel. But of course a direct way would b propably way more convienient. Best wishes Andreas – Andreas Jun 18 '17 at 15:38
  • When you step through the code, by hitting F8 over and over, where does the error "our of range" occur? Which line throws the error? That should be a clue. Although, I can't surmise where the error would be? I don't see anything suspect and everything worked fine for me. – ASH Jun 19 '17 at 12:08
0

General:

I think in your research you may have come across this question and misunderstood how it relates/doesn't relate to your circumstance.

I don't think iFrames are relevant to your query. If you are after the list of names, their details and the URLs to their pages you can use the code below.


CSS Selectors

To target the elements of interest I use the following two CSS selectors. These use style infomation on the page to target the elements:

.SearchResults-link
.SearchResults-item

"." means class, which is like saying .getElementsByClassName. The first gets the links, and the second gets the description information on the first page.

With respect to the first CSS selector: The actual link required is dynamically constructed, but we can use the fact that the actual profile URLs have a common base string of "https://www.xing.com/profile/", which is then followed by the profileName. So, in function GetURL, we parse the outerHTML returned by the CSS selector to get the profileName and concatenate it with the BASESTRING constant to get our actual profile link.


Code:

Option Explicit
Public Sub GetInfo()
    Dim IE As New InternetExplorer
    With IE
        .Visible = True
        .navigate "https://www.xing.com/publicsearch/query?search%5Bq%5D=IT"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim a As Object, exitTime As Date, linksNodeList As Object, profileNodeList As Object

'        exitTime = Now + TimeSerial(0, 0, 5) '<== uncomment this section if timing problems
'
'        Do
'            DoEvents
'            On Error Resume Next
'            Set linksNodeList = .document.querySelectorAll(".SearchResults-link")
'            On Error GoTo 0
'            If Now > exitTime Then Exit Do
'        Loop While linksNodeList Is Nothing

        Set linksNodeList = .document.querySelectorAll(".SearchResults-link") '<== comment this out if uncommented section above
        Set profileNodeList = .document.querySelectorAll(".SearchResults-item")

        Dim i As Long
        For i = 0 To profileNodeList.Length - 1
            Debug.Print "Profile link: " & GetURL(linksNodeList.item(i).outerHTML)
            Debug.Print "Basic info: " & profileNodeList.item(i).innerText
        Next i
    End With
End Sub

Public Function GetURL(ByVal htmlSection As String) As String
    Const BASESTRING As String = "https://www.xing.com/profile/"
    Dim arr() As String
    arr = Split(htmlSection, "/")
    GetURL = BASESTRING & Replace$(Split((arr(UBound(arr) - 1)), ">")(0), Chr$(34), vbNullString)
End Function

Example return information:

Output example

QHarr
  • 83,427
  • 12
  • 54
  • 101