ASP.NET - Parse / Query HTML Before Transmission and Insert CSS Class References

Question

As a web developer I feel too much of my time is spent on CSS. I am trying to come up with a solution where I can write re-usable CSS i.e. classes and reference these classes in the HTML without additional code in ASPX or ASCX files etc. or code-behind files. I want an intermediary which links up HTML elements with CSS classes.

What I want to achieve:

Modify HTML immediately before transmission
Select elements in the HTML
Based on rules defined elsewhere (e.g. in a text file relating to the page currently being processed):
Add a CSS class reference to multiple HTML elements
Add multiple CSS class references to a single HTML element

How I envisage this working:

Extend ASP.NET functions which generate final HTML
Grab all the HTML as a string
Pass the string into a contructor for an object with querying (e.g. XPATH) methods
Go through list of global rules e.g. for child ul of first div then class = "navigation"
Go through list of page specific rules e.g. for child ul of first div then class &= " home"
Get processed HTML from object e.g. obj.ToString
ASP.NET to resume page generation using processed HTML

So what I need to know is:

Where / how can I extend ASP.NET page generation functions (to get all HTML of page)
What classes have element / node querying methods and access to attributes

Thanks for your help in advance.

P.S. I am developing ASP.NET web forms websites with VB.net code-behinds running on ISS 7

I don't have an answer on this particular method, but if you want to write re-usable CSS, I'd suggest learning a CSS-generating language like [SASS](http://sass-lang.com/) or [Less](http://lesscss.org/). — avesse, Jun 11 '12 at 20:01
IMHO, this approach can lead to serious performance issues (realizing the final HTML instead of streaming it to the standard response output stream, reparsing it, modifying it, etc.). Anyway, are you aware of ASP.NET Device Filtering technology (http://msdn.microsoft.com/en-us/library/ms178620.aspx) ? That could help what are you trying to achieve. — Simon Mourier, Jun 13 '12 at 16:17
Device Filtering looks interesting but doesn't it require altering my markup in the ASPX pages? I'm trying to work out how I can do this as an separately from my ASPX pages and code-behind in a "globally" scoped file. — Chris Cannon, Jun 13 '12 at 20:54
@ChrisCannon - yes, it's based on markup. ASP.NET markup extension is often based on some initial markup. (ps: when you make comment for someone here on SO, prefix your comment with '@nickname' or the recipient does not know you made a comment) — Simon Mourier, Jun 14 '12 at 10:35

Jamie Treworgy · Accepted Answer · 2012-06-14T18:42:49.227

Check out my CsQuery project: https://github.com/jamietre/csquery or on nuget as "CsQuery".

This is a C# (.NET 4) port of jQuery. In basic performance tests (included in the project test suite) selectors are about 100 times faster than HTML Agility Pack + Fizzler (a css selector add-on for HAP); it's plenty fast for manipulating the output stream in real time on a typical web site. If you are amazon.com or something, of course, YMMV.

My initial purpose in developing this was to manipulate HTML from a content management system. Once I had it up and running, I found that using CSS selectors and the jQuery API is a whole lot more fun than using web controls and started using it as a primary HTML manipulation tool for server-rendered pages, and built it out to cover pretty much all of CSS, jQuery and the browser DOM. I haven't touched a web control since.

To intercept HTML in webforms with CsQuery you do this in the page codebehind:

using CsQuery;
using CsQuery.Web;

protected override void Render(HtmlTextWriter writer)
{

    var csqContext = WebForms.CreateFromRender(Page, base.Render, writer);

    // CQ object is like a jQuery object. The "Dom" property of the context
    // returned above represents the output of this page.

    CQ doc = csqContext.Dom;

    doc["li > a"].AddClass("foo");

    // write it
    csqContext.Render();
}

To do the same thing in ASP.NET MVC please see this blog post describing that.

There is basic documentation for CsQuery on GitHub. Apart from getting HTML in and out, it works pretty much like jQuery. The WebForms object above is just to help you handle interacting with the HtmlTextWriter object and the Render method. The general-purpose usage is very simple:

var doc = CQ.Create(htmlString);

// or (useful for scraping and testing)
var doc = CQ.CreateFromUrl(url);

// do stuff with doc, a CQ object that acts like a jQuery object

doc["table tr:first"].Append("<td>A new cell</td>");

Additonally, pretty much the entire browser DOM is available using the same methods you use in a browser. The indexer [0] returns the first element in the selection set like jquery; if you are used to write javascript to manipulate HTML it should be very familiar.

// "Select" method is the same as the property indexer [] we used above.
// I go back and forth between them to emphasise their interchangeability.

var element = dom.Select("div > input[type=checkbox]:first-child")[0];
a.Checked=true;

Of course in C# you have a wealth of other general-purpose tools like LINQ at your disposal. Alternatively:

var element = dom["div > input[type=checkbox]:first-child"].Single();

a.Checked=true;

When you're done manipulating the document, you'll probably want to get the HTML out:

string html = doc.Render();

That's all there is to it. There are a vast number of methods on the CQ object, covering all the jQuery DOM manipulation techniques. There are also utility methods for handling JSON, and it has extensive support for dynamic and anonymous types to make passing data structures (e.g. a set of CSS classes) as easy as possible -- much like jQuery.

Some More Advanced Stuff

I don't recommend doing this unless you are familiar with lower-level tinkering with asp.net's http workflow. There's nothing at all undoable but there will be a learning curve if you've never heard of an HttpHandler.

If you want to skip the WebForms engine altogether, you can create an IHttpHandler that automatically parses HTML files. This would definitely perform better than overlaying on a the ASPX engine -- who knows, maybe even faster than doing a similar amount of server-side processing with web controls. You can then then register your handler using web.config for specific extensions (like htm and html).

Yet another way to automatically intercept is with routing. You can use the MVC routing library in a webforms app with no trouble, here's one description of how to do this. Then you can create a route that matches whatever pattern you want (again, perhaps *.html) and pass handling off to a custom IHttpHandler or class. In this case, you're doing everything: you will need to look at the path, load the file from the file system, parse it with CsQuery, and stream the response.

Using either mechanism, you'll need a way to tell your project what code to run for each page, of course. That is, just because you've created a nifty HTML parser, how do you then tell it to run the correct "code behind" for that page?

MVC does this by just locating a controller with the name of "PageNameController.cs" and calling a method that matches the name of the parameter. You could do whatever you want; e.g. you could add an element:

<script type="controller" src="MyPageController"></script>

Your generic handler code could look for such an element, and then use reflection to locate the correct named class & method to call. This is pretty involved, and beyond the scope of this answer; but if you're looking to build a whole new framework or something this is how you would go about it.

Thanks for your response, but I have to say that replacing the WebForms engine seems bonkers! But then there is a fine line between bonkers and genuis! I will move to MVC at some point, but my main goal ATM is reducing time spent adding CSS classes into numerous ASPX pages etc. This is where CSQuery will come into play, and I plan to override the Render method as above, but then call my own function where all the parsing takes place so effectively the Render method for each page will just be a wrapper for my AddCssClasses function in my CssHelper class. — Chris Cannon, Jun 14 '12 at 22:02
I just have a couple of questions regarding your post: (a) You mention that not all of the DOM is available? Just curious! (b) You mention that CsQuery is faster than HAP + Fizzler but is it definitely faster than HAP alone? (c) Please expand on "and it has extensive support for dynamic and anonymous types to make passing data structures (e.g. a set of CSS classes) as easy as possible" as I'm not sure how a set of CSS classes could be a data type? I think I know what you mean I'm just not 100% — Chris Cannon, Jun 14 '12 at 22:09
Sorry, for (c) I mean, are you talking PURELY about JSON coverage there? If not then how would I represent a set of CSS classes in a data type? Which parts of CsQuery are generic? — Chris Cannon, Jun 14 '12 at 22:22
a) not all of DOM element properties have been implemented, in many cases because it doesn't make sense in a client model (e.g. events), in some cases because the property is essentially the same as an attribute in this context (e.g. "href") so it's not that important. b) HAP alone is REALLY slow. I initially did a complex descendant selector test on my 6MB doc with XML selectors; it took minutes. HAP alone is not indexed in any way. c) See jquery docs for "css(map)": http://api.jquery.com/css/ and refer to the readme on the CsQuery github page for passing CSS props as an anon object. — Jamie Treworgy, Jun 15 '12 at 01:18
Oh.. "replacing the WebForms engine seems bonkers!" yeah well I was stuck with WebForms on this big ol' legacy project and had already set up routing so I could use REST paths so it wasn't that much of a stretch at that point :) what can I say I'm a hacker. The way you plan to do it is certainly the most reasonable for your goals, I just didn't want to leave anything off the table! — Jamie Treworgy, Jun 15 '12 at 01:21
One more comment re: json vs. CSS - CsQuery will actually let you pass in JSON directly, e.g. `doc["div > span"].CssSet("{ \"height\": 10, \"width\": 10}");` will add css styles height & width with values 10 on all span 1st children of all divs. You can also use objects, e.g. `doc["div > span"].CssSet(new { height=10, width=10});` does the same thing with an anonymous object. See the readme on github for details. — Jamie Treworgy, Jun 15 '12 at 01:24
Hi just a quick question is the `protected override void Render` the same when in the code-behind for a master page? Wouldn't `base.render` then point to the render for the master page and not the page, or doesn't it matter? Thanks. — Chris Cannon, Jul 30 '13 at 10:40

score 1 · Answer 2 · answered Jun 14 '12 at 19:07

Intercepting the content of the page prior to it being sent is rather simple. I did this a while back on a project that compressed content on the fly: http://optimizerprime.codeplex.com/ (It's ugly, but it did its job and you might be able to salvage some of the code). Anyway, what you want to do is the following:

1) Create a Stream object that saves the content of the page until Flush is called. For instance I used this in my compression project: http://optimizerprime.codeplex.com/SourceControl/changeset/view/83171#1795869 Like I said before, it's not pretty. But my point being you'll need to create your own Stream class that will do what you want (in this case give you the string output of the page, parse/modify the string, and then output it to the user).

2) Assign the page's filter object to it. (Page.Response.Filter) Note that you need to do it rather early on so you can catch all of the content. I did this with a HTTP Module that ran on the PreRequestHandlerExecute event. But if you did something like this:

    protected override void OnPreInit(EventArgs e)
    {
        this.Response.Filter = new MyStream();
        base.OnPreInit(e);
    }

That would also most likely work.

3) You should be able to use something like Html Agility Pack to parse the HTML and modify it from there.

That to me seems like the easiest approach.

I'm leaning more towards overriding the page's render method and then passing the render method's parameters to my own function which is in my own class - somewhere else, which does the processing via a HTML parser (yet to be decided). I'm slightly against implementing my own stream class because of all the methods which I would need to implement and it seems more complicated this way. — Chris Cannon, Jun 14 '12 at 21:43
Actually the only methods that you really need to implement are write and flush. Everything else can be ignored for the most part because you're not reading from the stream nor seeking (well that and like 3 properties need to be implemented [CanSeek, CanRead, CanWrite], which takes about 20 seconds). — JaCraig, Jun 15 '12 at 20:24
Ok thanks for the tips, but I think implementing my own stream class is overkill for what I'm trying to do! — Chris Cannon, Jun 15 '12 at 21:07

ASP.NET - Parse / Query HTML Before Transmission and Insert CSS Class References

2 Answers2

Linked