2

I am writing a program to detect markdown emphasis syntax in text. For example, the bold syntax enclosed with **, and the italic syntax enclosed with *.

I have the following regex pattern:

NSRegularExpression *regex;
regex = [NSRegularExpression regularExpressionWithPattern:@"(\\*{1,2}).+?(\\*{1,2})"
                                                                       options:NSRegularExpressionDotMatchesLineSeparators
                                                                         error:NULL];

However, this pattern also detects mis-paired ones. For example, matching * this is a **sample** text will return * this is a ** instead of **sample**.

How to solve the problem?

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
Jensen
  • 1,653
  • 4
  • 26
  • 42

1 Answers1

2

You could use a back reference, with this pattern:

(\*{1,2}).+?\1

This means that whatever is captured in the first group (a single or double asterisk) it must be repeated later as \1.

For example:

NSRegularExpression *regex;
regex = [NSRegularExpression regularExpressionWithPattern:@"(\\*{1,2}).+?\\1"
    options:NSRegularExpressionDotMatchesLineSeparators
    error:NULL];
p.s.w.g
  • 146,324
  • 30
  • 291
  • 331