Check out my CsQuery project: https://github.com/jamietre/csquery or on nuget as "CsQuery".
This is a C# (.NET 4) port of jQuery. In basic performance tests (included in the project test suite) selectors are about 100 times faster than HTML Agility Pack + Fizzler (a css selector add-on for HAP); it's plenty fast for manipulating the output stream in real time on a typical web site. If you are amazon.com or something, of course, YMMV.
My initial purpose in developing this was to manipulate HTML from a content management system. Once I had it up and running, I found that using CSS selectors and the jQuery API is a whole lot more fun than using web controls and started using it as a primary HTML manipulation tool for server-rendered pages, and built it out to cover pretty much all of CSS, jQuery and the browser DOM. I haven't touched a web control since.
To intercept HTML in webforms with CsQuery you do this in the page codebehind:
using CsQuery;
using CsQuery.Web;
protected override void Render(HtmlTextWriter writer)
{
var csqContext = WebForms.CreateFromRender(Page, base.Render, writer);
// CQ object is like a jQuery object. The "Dom" property of the context
// returned above represents the output of this page.
CQ doc = csqContext.Dom;
doc["li > a"].AddClass("foo");
// write it
csqContext.Render();
}
To do the same thing in ASP.NET MVC please see this blog post describing that.
There is basic documentation for CsQuery on GitHub. Apart from getting HTML in and out, it works pretty much like jQuery. The WebForms
object above is just to help you handle interacting with the HtmlTextWriter
object and the Render
method. The general-purpose usage is very simple:
var doc = CQ.Create(htmlString);
// or (useful for scraping and testing)
var doc = CQ.CreateFromUrl(url);
// do stuff with doc, a CQ object that acts like a jQuery object
doc["table tr:first"].Append("<td>A new cell</td>");
Additonally, pretty much the entire browser DOM is available using the same methods you use
in a browser. The indexer [0] returns the first element in the selection set like jquery; if you are used to write javascript to manipulate HTML it should be very familiar.
// "Select" method is the same as the property indexer [] we used above.
// I go back and forth between them to emphasise their interchangeability.
var element = dom.Select("div > input[type=checkbox]:first-child")[0];
a.Checked=true;
Of course in C# you have a wealth of other general-purpose tools like LINQ at your disposal. Alternatively:
var element = dom["div > input[type=checkbox]:first-child"].Single();
a.Checked=true;
When you're done manipulating the document, you'll probably want to get the HTML out:
string html = doc.Render();
That's all there is to it. There are a vast number of methods on the CQ
object, covering all the jQuery DOM manipulation techniques. There are also utility methods for handling JSON, and it has extensive support for dynamic and anonymous types to make passing data structures (e.g. a set of CSS classes) as easy as possible -- much like jQuery.
Some More Advanced Stuff
I don't recommend doing this unless you are familiar with lower-level tinkering with asp.net's http workflow. There's nothing at all undoable but there will be a learning curve if you've never heard of an HttpHandler.
If you want to skip the WebForms engine altogether, you can create an IHttpHandler
that automatically parses HTML files. This would definitely perform better than overlaying on a the ASPX engine -- who knows, maybe even faster than doing a similar amount of server-side processing with web controls. You can then then register your handler using web.config for specific extensions (like htm
and html
).
Yet another way to automatically intercept is with routing. You can use the MVC routing library in a webforms app with no trouble, here's one description of how to do this. Then you can create a route that matches whatever pattern you want (again, perhaps *.html
) and pass handling off to a custom IHttpHandler
or class. In this case, you're doing everything: you will need to look at the path, load the file from the file system, parse it with CsQuery, and stream the response.
Using either mechanism, you'll need a way to tell your project what code to run for each page, of course. That is, just because you've created a nifty HTML parser, how do you then tell it to run the correct "code behind" for that page?
MVC does this by just locating a controller with the name of "PageNameController.cs" and calling a method that matches the name of the parameter. You could do whatever you want; e.g. you could add an element:
<script type="controller" src="MyPageController"></script>
Your generic handler code could look for such an element, and then use reflection to locate the correct named class & method to call. This is pretty involved, and beyond the scope of this answer; but if you're looking to build a whole new framework or something this is how you would go about it.