I am trying to come up with a neat solution to create automated json
schema markup on my aspx pages
. The markup in question is FAQPage, but that's irrelevant.
I decided that I needed to scrape the content of the current page to find questions and answers. After a few false starts I came across the HtmlAgilityPack plugin which enables me to achieve what I want, but I've come across some issues.
The HtmlAgililtyPack parser can be initiated in a number of ways, but the only one I could get to work for me and my scenario (scrape current page) was to feed in a string.
First, I created an asp ID with a runat="server"
tag.
To get the string, I used HTMLTextWriter; here's the code:
static string ConvertControlToString(Control ctl)
{
string s = null;
var sw = new StringWriter();
using (var w = new HtmlTextWriter(sw))
{
ctl.RenderControl(w);
s = sw.ToString();
}
return s;
}
Now, all that works fine - in most cases.
However, I'm running into edge cases where I use scriptmanager and updatepanels. I suspect there will be more. The error is: ... must be inside a form control with a runat="server"
. Of course it is but the rendercontrol doesn't realise it.
So, two questions:
- Is there a way to feed HtmlAgilityPack parser in another way that doesn't require a string (and that won't loop)?
- Is there a better way to scrape the text other than Control.RenderControl() that won't cause errors?
Incidentally, I've found a solution to the problem I'm having but it involves manipulating each affected page, and that's not great.
So, thought I'd throw it out there and see if there are better workarounds or a better solution.