0

I'm currently working on a project at work (I'm an intern), that is expected to take me a few weeks to complete. It's basically a migration copy and paste job from one website to another. In order to save myself time, boredom and possibly win myself a job (If I'm able to complete this in a timely manner), I'm looking at ways to automate the process. Currently I've figured each step in the process but one.

Basically, I have another automation program downloading the html file and converting it into a text file for each page on the site that needs to be copied over to the new one (over 1000). What I need to do with this html file is to extract just the body, I've identified an start and end point to extract from to . Both these appear in all of the html files.

I'm currently attempting to use VBA in excel to open up the file, extract the data and write the result in a new file, from here I can automate the copy and paste process.

What I can't figure out is how to extract data between these two point. I can extract data between two strings i.e. "Start" & "End" however, I can't seem to extract data between two html tags. Any suggestions would be fantastic. I'm not a programmer, and I'm learning on the fly in order to complete this project ASAP.

Thanks again.

George Kemp
  • 541
  • 1
  • 7
  • 21
  • 4
    If you can do `start` and `end`, why can't you do `` and `` ? Welcome to Stack Overflow - people generally appreciate it you post the code you already tried to solve the problem with.... – Robin Mackenzie Jul 20 '16 at 15:52

1 Answers1

0

Generally, it would be done like this:

Sub Test() Dim IE As Object

Set IE = CreateObject("InternetExplorer.Application")
With IE
    .Visible = True
    .Navigate "http://www.marketwatch.com/investing/stock/aapl/analystestimates" ' should work for any URL
    Do Until .ReadyState = 4: DoEvents: Loop

        x = .document.body.innertext
        y = InStr(1, x, "Average Target Price:")
        Z = Mid(x, y, 6)

        Range("A1").Value = Trim(Z)

        .Quit
    End With
End Sub

In your specific case, it should be something like this:

a = .document.body.innertext
b = InStr(1, x, "Start")
c = InStr(1, y, "End")

d = Mid(a, b, (c-b))

Range("A1").Value = Trim(d)
ASH
  • 20,759
  • 19
  • 87
  • 200