CsQuery - Efficiently selecting, iterating and changing state e.g. For loop vs Each method

Question

Using CsQuery I need to modify a set of anchor elements e.g.

'...

Dim cqContext = WebForms.CreateFromRender(Page, AddressOf MyBase.Render, writer)

Dim cq = cqContext.Dom

Dim foo = cq("a")

'...

For example, on all anchor elements I need to replace "/MyFolderName/" at the start of the href with "../".

I could simply do this:

For Each i In cq("a")
    i.Attributes("href") = Regex.Replace(i.Attributes("href"), "^/MyFolderName/", "../", RegexOptions.IgnoreCase)
Next

Or is it more efficient to use the each utility method?

Should I also narrow my selector to cq("a[href^=""/MyFolderName/""]")?

Or is there an even better way?

Edit: By efficient I don't just mean least expensive, I'm also looking for more elegant e.g. less code / less operations ways of doing it.

Specifically:

Should I be using the each utility method?
Should I be narrowing my selector as above?

Searches depend a lot on the size of the collection. It think your best bet if to try them all and test the speed. — the_lotus, Jul 30 '13 at 16:46

score 1 · Answer 1 · answered Jul 30 '13 at 16:34

1

In asking this question, there is an implicit assumption that you have implemented one of these solutions and, through instrumentation, found that it is the source of an unacceptable slow-down. If you haven't done that, I'll defer to Donald Knuth:

Premature optimization is the root of all evil.

answered Jul 30 '13 at 16:34

Andrew Coonce

1,557
11
19

You're right in your assumption, but I have a saying for assumptions too "assumptions are the mother of all ..." you get the idea. Actually I thought it was a speed issue - but it turns out it wasn't - regardless I worded the question so as to encourage suggestions on how to improve the code e.g. less code / less operations and so that I can learn from developers experienced in this area. – Chris Cannon Jul 30 '13 at 17:40
In that case, the most important thing is to write the code in a consistent, predictable way across your team. I don't think there's a right answer here, save for whatever you want to codify as a best practice for your team. – Andrew Coonce Jul 30 '13 at 17:43
I agree with what your saying - in my case I am the team! Just me... I always try to be consistent. – Chris Cannon Jul 30 '13 at 17:50

Jamie Treworgy · Accepted Answer · 2013-07-30T20:52:56.623

From the perspective of the way CsQuery implements the things in question here:

Using for vs Each will make no substantive difference.
Using cq("a[href^='MyFolderName']") might be different from your the approach you coded, but only because CsQuery implements the substring search differently. CsQuery uses an index to locate elements based on selectors for class, id, attribute name, and tag name, and that part is very fast. But the substring search is still done the old fashioned way, each a node must be scanned to see if it matches the substring. The code is here:
```
        case AttributeSelectorType.Contains:
            return value != null && value.IndexOf(selector.AttributeValue,
                selector.AttributeValueStringComparison)>=0;
```

So the real question is, is indexOf faster than a regex search & replace. (My guess? Probably, since it's a single-purpose method.)

But at the end of the day I also agree with the comment about premature optimization. I would prefer to write the selector to target exactly what you want, whenever it is possible through selector syntax:

cq("a[href^='MyFolderName']")

since it is expressive and compact. If for some reason the selector syntax turned out to be a lot slower than using some other method to narrow the selection beyond what CsQuery indexes, then you can always change it later.

There's also a regex selector built in based on James Padolsey's jQuery extension, see this blog post for a description & usage. I would expect this to perform about the same as your method :)

Finally: by far the most time CsQuery will spend, in most situations, is parsing your document in the first place. It's unlikely whatever you do after that will have a significant performance impact compared to just using CsQuery in the first place. But if you find that matters, you have options there too, there are alternate implementations of the indexing strategy available that can be targeted towards the way you intend to use the document after it's parsed.

Thanks for the detailed answer. It may seem like premature optimisation but really it's just about "professional" way of doing things vs "amateur" way of doing things - not helped by me coding in VB.NET (fortunately I can understand C# code somewhat). I will consider changing my selector to include the starts with check, btw the code you posted in your answer is for the contains? — Chris Cannon, Jul 30 '13 at 21:24

CsQuery - Efficiently selecting, iterating and changing state e.g. For loop vs Each method

2 Answers2