9

I would like to replace all strings that are enclosed by - into strings enclosed by ~, but not if this string again is enclosed by *.

As an example, this string...

The -quick- *brown -f-ox* jumps.

...should become...

The ~quick~ *brown -f-ox* jumps.

We see - is only replaced if it is not within *<here>*.

My javascript-regex for now (which takes no care whether it is enclosed by * or not):

var message = source.replace(/-(.[^-]+?)-/g, "~$1~");

Edit: Note that it might be the case that there is an odd number of *s.

poitroae
  • 21,129
  • 10
  • 63
  • 81
  • yup i am amazed, such a good question. – Jai Mar 28 '13 at 13:30
  • 2
    What when there is odd number of * characters? E.g. `The *-quick-* brown * -f-ox* jumps*.` Which `-` characters should be replaced and why? – Marek Musielak Mar 28 '13 at 13:32
  • @Maras The last `*` is not replaced. It is printed as `*` – poitroae Mar 28 '13 at 13:34
  • Firefox's `y` flag, [said to be proposed for ECMAScript 6](http://xregexp.com/flags/) would help a lot: https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/RegExp – minopret Mar 28 '13 at 13:36

4 Answers4

2

That's a tricky sort of thing to do with regular expressions. I think what I'd do is something like this:

var msg = source.replace(/(-[^-]+-|\*[^*]+\*)/g, function(_, grp) {
  return grp[0] === '-' ? grp.replace(/^-(.*)-$/, "~$1~") : grp;
});

jsFiddle Demo

That looks for either - or * groups, and only performs the replacement on dashed ones. In general, "nesting" syntaxes are challenging (or impossible) with regular expressions. (And of course as a comment on the question notes, there are special cases — dangling metacharacters — that complicate this too.)

Mark Pieszak - Trilon.io
  • 61,391
  • 14
  • 82
  • 96
Pointy
  • 405,095
  • 59
  • 585
  • 614
  • 1
    Working example : http://jsfiddle.net/Zb6BU/ - not sure why this isn't getting up votes, this works just as intended! +1 – Mark Pieszak - Trilon.io Mar 28 '13 at 13:52
  • @Bergi I see that now, +1's for all :) haha – Mark Pieszak - Trilon.io Mar 28 '13 at 14:02
  • @Pointy: The group is really unnecessary, it matches what the first argument contains… And you shouldn't use bracket notation on strings. – Bergi Mar 28 '13 at 14:02
  • @Bergi yes that's probably true; it's just a habit. I don't know what you mean about bracket notation. Oh - you mean instead of `.charAt()`? Are there modern browsers that don't do that? – Pointy Mar 28 '13 at 14:14
  • @Bergi [it's in the spec :-)](http://www.ecma-international.org/ecma-262/5.1/#sec-15.5.5.2) – Pointy Mar 28 '13 at 14:16
  • Could you please explain how it would look like if you replace all `:D`, but not if they're between `-`s? – poitroae Apr 01 '13 at 01:26
  • @poitroae - hmm, that's a somewhat different problem; I'd have to think about it. You may want to ask a separate question. – Pointy Apr 01 '13 at 04:57
1

I would solve it by splitting the array based on * and then replacing only the even indices. Matching unbalanced stars is trickier, it involves knowing whether the last item index is odd or even:

'The -quick- *brown -f-ox* jumps.'
    .split('*')
    .map(function(item, index, arr) { 
        if (index % 2) {
            if (index < arr.length - 1) {
                return item; // balanced
            }
            // not balanced
            item = '*' + item;
        }
        return item.replace(/\-([^-]+)\-/, '~$1~');
    })
    .join('');

Demo

Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
1

Finding out whether a match is not enclosed by some delimiters is a very complicated task - see also this example. Lookaround could help, but JS only supports lookahead. So we could rewrite "not surrounded by ~" to "followed by an even number or ~", and match on that:

source.replace(/-([^-]+)-(?=[^~]*([^~]*~[^~]*~)*$)/g, "~$1~");

But better we match on both - and *, so that we consume anything wrapped in *s as well and can then decide in a callback function not to replace it:

source.replace(/-([^-]+)-|\*([^*]+)\*/g, function(m, hyp) {
    if (hyp) // the first group has matched
        return "~"+hyp+"~";
    // else let the match be unchanged:
    return m;
});

This has the advantage of being able to better specify "enclosed", e.g. by adding word boundaries on the "inside", for better handling of invalid patterns (odd number of * characters as mentioned by @Maras for example) - the current regex just takes the next two appearances.

Community
  • 1
  • 1
Bergi
  • 630,263
  • 148
  • 957
  • 1,375
0

A terser version of Jack's very clear answer.

source.split(/(\*[^*]*\*)/g).map(function(x,i){
return i%2?x:x.replace(/-/g,'~');
}).join('');

Seems to work, Cheers.

d'alar'cop
  • 2,357
  • 1
  • 14
  • 18