0

I need to load the content of a DIV element in HTML into a variable so that it can be printed to an HTA file. However, I have about 70 other DIV elements in the document and I need to access one specific one. Is there a shortcut to accessing the content of an element with a specific attribute?

Here's a sample of the element I need to access

<div id='storytext'>
    <p>Story Text</p>
</div>
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
  • Well with C# code you are not going to find the `DIV` unless it has a `runat="server"` attribute applied to it. – Karl Anderson Jun 28 '13 at 16:56
  • Are you using C# to parse an HTML file or is this on an asp.net page you've created? – RandomWebGuy Jun 28 '13 at 16:58
  • It's not my HTML code, the program's goal is to download the source code of a page, extract the HTML from that DIV element then write it to an HTA file. – Calzone Stromboli Jun 28 '13 at 17:02
  • You can regex the HTML and return the div tag, or have you tried loading the HTML into and Xml parser and pulling the data out that way? *Most* HTML is basically XML, so this *should* work, but not always... – Nick DeMayo Jun 28 '13 at 17:11
  • Aha! I believe I found a solution, http://stackoverflow.com/questions/4737757/c-html-agility-selecting-every-paragraph-within-a-div-tag?rq=1 – Calzone Stromboli Jun 28 '13 at 17:13

2 Answers2

2

I would use the HTML Agility Pack to pull out the content.

The code will look something like this

var htmlDocument = new HtmlDocument();
htmlDocument.Load(rawHTML); // string containing the HTML content

var storyDiv = htmlDocument.DocumentNode.Descendants("div").Any(x=>x.id == "storytext").FirstOrDefault();

From there you can use storyDiv.InnerText or storyDiv.InnerHTML to get the contents. (Don't forget to check to make sure storyDiv is not null)

Oscar Mederos
  • 29,016
  • 22
  • 84
  • 124
RandomWebGuy
  • 1,439
  • 11
  • 23
1

So you're downloading a page that contains a DIV, programmatically, and you want to get the contents of that DIV?

Assuming you have the downloading of the page working, you might want to try the Html Agility Pack. This library gives you a Linq to XML-like API for working with the looser standards of HTML pages.

If you don't want to do that, if the DIV tag were very predictable (has only the id attribute, or has the attributes in a particular order) you could use a regular expression to pull it out. However, that would require so much fiddling around (given that your DIV has HTML content) that I would recommend just starting with the Html Agility Pack.

Ann L.
  • 13,760
  • 5
  • 35
  • 66
  • Thanks, I'm already pulling from two of my own libraries so I'd rather not have to go to another :P I do think I've found a solution in another post about the HTML Agility Pack, pretty sure it'll work here – Calzone Stromboli Jun 28 '13 at 17:22