7

I forked the excellent zen-coding project, with an idea to implement DOM ascension using a ^ - so you can do:

html>head>title^body>h1 rather than html>(head>title)+body>h1

Initially I implemented with rather shoddy regex methods. I have now implemented using @Jordan's excellent answer. My fork is here

What I still want to know

Are there any scenarios where my function returns the wrong value?

Billy Moon
  • 57,113
  • 24
  • 136
  • 237
  • 1
    That doesn't look like something you can do with just a regular expression. I'd figure out exactly what your grammar is and then write a real parser for it. – Pointy Apr 01 '11 at 12:29
  • I think it already has a very good tokenizing parser - just that I don't know how it works. I was hoping someone could assist. – Billy Moon Apr 01 '11 at 12:35
  • Some things are too complex to sanely process in regex, perhaps you should have some code to do this instead. For a massive project I worked on I used a mixture of straight string replacement, regex, and logic statements in code to process html documents. Tokenizing parsers cannot alone perform much logic and as a result are sometimes unable to handle weird cases. – Michael Shopsin Apr 01 '11 at 13:49
  • 1
    @tchrist will be able to write a regex for this I bet... – El Ronnoco Feb 09 '12 at 10:41
  • 1
    This belongs on [Codereview.SE]. – zzzzBov Feb 09 '12 at 22:24
  • @zzzzBov Huh, I did not know that site existed. Thank you for adding to my Stack Exchange knowledge! :) – Jordan Gray Feb 09 '12 at 23:16
  • 1
    If the question was on Code Review, then @Jordan would not have seen it, and could not have given his excellent and useful answer! – Billy Moon Feb 10 '12 at 07:54

2 Answers2

4

Disclaimer: I have never used zen-coding and this is only my second time hearing about it, so I have no idea what the likely gotchas are. That said, this seems to be a working solution, or at least very close to one.

I am using Zen Coding for textarea v0.7.1 for this. If you are using a different version of the codebase you will need to adapt these instructions accordingly.

A couple of commenters have suggested that this is not a job for regular expressions, and I agree. Fortunately, zen-coding has its own parser implementation, and it's really easy to build on! There are two places where you need to add code to make this work:

  1. Add the ^ character to the special_chars variable in the isAllowedChar function (starts circa line 1694):

    function isAllowedChar(ch) {
        ...
        special_chars = '#.>+*:$-_!@[]()|^'; // Added ascension operator "^"
    
  2. Handle the new operator in the switch statement of the parse function (starts circa line 1541):

    parse: function(abbr) {
        ...
        while (i < il) {
            ch = abbr.charAt(i);
            prev_ch = i ? abbr.charAt(i - 1) : '';
            switch (ch) {
                ...
                // YOUR CODE BELOW
                case '^': // Ascension operator
                    if (!text_lvl && !attr_lvl) {
                        dumpToken();
                        context = context.parent.parent.addChild();
                    } else {
                        token += ch;
                    }
                    break;
    

    Here's a line-by-line breakdown of what the new code does:

    case '^':                         // Current character is ascension operator.
        if (!text_lvl && !attr_lvl) { // Don't apply in text/attributes.
            dumpToken();              // Operator signifies end of current token.
    
                                      // Shift context up two levels.
            context = context.parent.parent.addChild();
    
        } else {
            token += ch;              // Add char to token in text/attribute.
        }
        break;
    

The implementation above works as expected for e.g.:

html>head>title^body
html:5>div#first>div.inner^div#second>div.inner
html:5>div>(div>div>div^div)^div*2
html:5>div>div>div^^div

You will doubtless want to try some more advanced, real-world test cases. Here's my modified source if you want a kick-start; replace your zen_textarea.min.js with this for some quick-and-dirty testing.

Note that this merely ascends the DOM by two levels and does not treat the preceding elements as a group, so e.g. div>div^*3 will not work like (div>div)*3. If this is something you want then look at the logic for the closing parenthesis character, which uses a lookahead to check for multiplication. (Personally, I suggest not doing this, since even for an abbreviated syntax it is horribly unreadable.)

Jordan Gray
  • 16,306
  • 3
  • 53
  • 69
  • I tested it all out, it works great. It is also an education for me on how the parser works (I still have much to learn). You really made the problem seem simple. Thanks. I also agree that `div>div^*3` is confusing syntax and I see no benefit really. – Billy Moon Feb 09 '12 at 22:06
  • Thanks @Billy! Glad to help. The `expandAbbreviation` method seems like a good place to start if you really want to grok what's going on; follow the calls from there and you'll be taken on a tour of most of the important stuff. :) – Jordan Gray Feb 09 '12 at 23:15
-1

You should look for Perl's Text::Balanced alternative in the language that you're using.

Michael Spector
  • 36,723
  • 6
  • 60
  • 88