0

I have links to 500 Wikipedia / Wikimedia Wikis, Talk Pages and history pages in an excel document that I'd like to parse to determine things like how many of the Wikis mention "advert" or "promotional" in the Talk page, how long the average Wiki is, how frequent edits are, etc.

I've figured out how to write a Visual Basics User Defined Function that will get the full HTML. Is there a plugin or some other way to get the text - as it appears on-screen - between two tags or identifiers, so I can pull out the information I need?

I am a business professional with very limited coding experience in comparison to a professional developer. But if you can point me in the right direction and to some good tutorials, I can learn. I'd also be interested in just paying someone a bit of money on the side if someone can help.

Adam Rackis
  • 82,527
  • 56
  • 270
  • 393

1 Answers1

0

You can use XML Parser and Regex to search for text in an HTML document.

To get text as seen on in the browser, write a function to delete all tags. Although, it may not always be accurate as CSS and Javascript can alter what is visible on the screen.

Jake
  • 11,273
  • 21
  • 90
  • 147