3

I have pieces of text where normal markdown and a custom markdown extension are mixed together. This works quite well.

[My Link Title](http://example.com)

(extension: value attribute: value)

However, I have one problem: to apply some stylings when editing the text, I need a way to match the opening bracket of extension snippet without matching the opening bracket of the markdown link.

In other words: I need a regular expression (that works in javascript) to match an opening bracket (and only the bracket) when it is

  1. proceeded by [a-z0-9]+: and
  2. not preceded by a ] character.

My current regular expression for that (which works well to match the extension tags opening brackets but unfortunately includes the markdown link opening brackets, too) looks like this: /\((?=[a-z0-9]+:)/i.

I have seen people use positive lookaheads with a negation at the beginning of the regular expression like this /(?=[^\]])\((?=[a-z0-9]+:)/i to check for this in PHP. Unfortunately, this doesn't seem to work in javascript.


Update

Thanks for your tips!

The problem I'm having is that I'm creating a "Simple Mode" syntax mode for CodeMirror to apply the highlighting. This allows you to specify a regex and a token that will be applied to the matched characters but doesn't allow any further operation on the matches. You could however write a full syntax mode where you can do this kind of operations, but I'm not capable of that :-s

After all, I went with another solution. I just created two regular expressions:

  1. Match all opening extension brackets with a preceding character other then "]":
    /[^\]]\((?=[a-z0-9]+:)/i
  2. Matches all opening extension brackets without any preceding character:
    /^\((?=[a-z0-9]+:)/i

Even though it isn't the cleanest possible way it seems to work quite well for now.

DieserJonas
  • 149
  • 11
  • 3
    You are asking for a negative look-behind that is unavailable in JS. Also, there is no known workaround for a case when you need both look-ahead and look-behind (like string reversing). You will have to match more than just a bracket and use capturing groups. Something like `/(^|[^\])\((?=[a-z0-9]+)/i`. – Wiktor Stribiżew Apr 29 '15 at 08:25
  • @stribizhev - That is generally true, but in this case you can probably reverse the string and use `\b\((?!\])`, which is pretty simple. – Kobi Apr 29 '15 at 09:05
  • @Kobi: True, the only `]` is a non-word character. Great! So, there would be no solution if the look-ahead and look-behind were of variable width. – Wiktor Stribiżew Apr 29 '15 at 09:22
  • @Kobi **Thanks for you hints!** Please see my updated question for the solution I used for now. – DieserJonas Apr 29 '15 at 11:36

1 Answers1

4

Using a skip and match trick:

\[[^\]]+\]\([^\)]+\)|(\(\b)
  • \[[^\]]+\]\([^\)]+\) - match []() links (you can also write \[.*?\]\(.*?\) if this is too confusing), OR -
  • (\(\b) - match and capture an open parentheses that is directly before an alphanumeric character.

Working example: https://regex101.com/r/tY9sS4/1

You would have to see the result and process only matches where the $1 grouped captured, and ignore the other matches.

Community
  • 1
  • 1
Kobi
  • 135,331
  • 41
  • 252
  • 292
  • Nice solution! The only problem I'm having is that I'm creating a "Simple Mode" syntax mode for CodeMirror for the highlighting. This allows you to specify a regex and a token that will be applied to the matched characters and doesn't allow any further operation on the matches. You could however write a full syntax mode where you can do this kind of operations, but I'm not capable of that :-s – DieserJonas Apr 29 '15 at 11:24
  • @DieserJonas - Oh, that's a shame! You did say "match an opening bracket (and only the bracket)". You might have a hard time with the JavaScript regex engine, it is pretty weak... Thanks! – Kobi Apr 29 '15 at 11:29
  • @DieserJonas - Also, I don't know CodeMirror, but can you match `[...](...)` in one syntax rule before you are looking for `(`? Usually these tools have priorities. – Kobi Apr 29 '15 at 11:31
  • I just edited my original question with the solution I'm using for now. The problem is that this "Simple Mode" is basically a state machine that switches between states and applies token types based on regexes. This is kind of limited. – DieserJonas Apr 29 '15 at 11:44
  • If I would be able to write a full syntax mode, I would be able to use your solution quite well, I think. Unfortunately, that's a bit above my current skill level. I will probably revisit this issue some time in the future and make a nicer solution based on your answer. – DieserJonas Apr 29 '15 at 11:48