0

I have VB code with HttpWebRequest that collects html of hundreds of websites but takes very long time to complete the task. Code basically is a for-to-loop and reads html of the each website in the listbox. In a loop, the extracted html of each website is searched for specific words. I want to display list of website that has word under each word column.

 For Each webAddr As String In lstbox.Items

        strHtml = Make_A_Call(webAddr)

        If strHtml.Contains("Keyword1") Then
            ..........
        End If
        If strHtml.Contains("Keyword2") Then
             ..........
        End If
        ..........
        ..........
        ..........
        ..........
        ..........
    Next

Private Function Make_A_Call(ByVal strURL As String) As String
    Dim strResult As String
    Dim wbrq As HttpWebRequest
    Dim wbrs As HttpWebResponse
    Dim sr As StreamReader

    Try
        strResult = ""
        wbrq = WebRequest.Create(strURL)
        wbrq.Method = "GET"
        ' Read the returned data   
        wbrs = wbrq.GetResponse
        sr = New StreamReader(wbrs.GetResponseStream)
        strResult = sr.ReadToEnd.Trim
        sr.Close()
        sr.Dispose()
        wbrs.Close()
    Catch ex As Exception
        ErrMessage.Text = ex.Message.ToString
        ErrMessage.ForeColor = Color.Red
    End Try
    Return strResult
End Function

Compiled code takes almost 5 minutes to complete the loop. Some times it fails to complete. Can it be modified to impove the performance. Please, help with better code and suggestions.

rekire
  • 47,260
  • 30
  • 167
  • 264
  • 3
    5 minutes for "hundreds of websites" doesn't seem very long. – Mat Sep 12 '12 at 18:14
  • Hmmm. If you want ok performance, rewrite your code to make use of the asynchronous WebRequest/WebClient/HttpClient APIs. If you want good performance, give up on .Net DNS resolution. If you want best performance, give up on on anything other than sockets. – spender Sep 12 '12 at 23:08

1 Answers1

0

Remember, there are two separate bottlenecks:

  • Bandwidth to download the HTML
  • CPU processing

You can't necessarily speed up the downloading using parallel processing; that can only be helped by buying more bandwidth. What you can do, though, is ensure that the downloading and processing are done on separate threads. I'd suggest doing the following:

  • Use BackgroundWorker instances to download the data.
  • In the work completed callback, first fire off the next Background Worker, then process the result of the existing worker (the keyword search).
McGarnagle
  • 101,349
  • 31
  • 229
  • 260