2

I'm trying to write a regex for matching user input that will be turned into italic format using markdown.

In the string i need to find the following pattern: an asterisk followed by any kind of non-whitespace character and ending with any kind of non-whitespace character followed by an asterisk.

So basically: substring *substring substring substring* substring should spit out *substring substring substring*.

So far I came up only with /\*(?:(?!\*).)+\*/, which matches everything between two asterisks, but it doesn't take into consideration whether the substring between asterisks starts or end with whitespace - which it shouldn't.

Thank you for your input! :)

Vid
  • 163
  • 1
  • 13
  • Try `/\*(?!\s).*?\*(?<!\s\*)/` – Wiktor Stribiżew Jul 16 '21 at 15:13
  • There's no lookahead necessary here, just do `/\*\S.*?\S\*/`. `\S` is the same as `[^\s]`, i.e. anything but whitespace. – isaactfa Jul 16 '21 at 15:19
  • 1
    @isaactfa However `\S` matches a `*` char, and `/\*\S.*?\S\*/` cannot match a `*.*` string with just one char between the asterisks. – Wiktor Stribiżew Jul 16 '21 at 15:22
  • 1
    @WiktorStribiżew Good point! It really should be `/\*[^\s\*](?:.*?[^\s\*])?\*/`. – isaactfa Jul 16 '21 at 15:24
  • 1
    Yes, or `/\*(?![\s*]).*?\*(?<![\s*]\*)/` then – Wiktor Stribiżew Jul 16 '21 at 15:34
  • 1
    `\*[^*\s](?:[^*]*[^*\s])?\*` would work – MonkeyZeus Jul 16 '21 at 17:11
  • Thank you guys! @MonkeyZeus i used your solution and it works! :) – Vid Jul 16 '21 at 17:36
  • I'm trying to add regex for bold as well, which works basically the same, but with two asterisks on both sides. However, the italic gets triggered first when i type the first asterisk on the right side: `**string*` -> `*italicString`. I would need to add a condition which would restrict italic regex finding a match if the string starts with two asterisks. What do I need to add to the proposed solution? – Vid Jul 16 '21 at 18:17
  • I tried it like that: `/((?!^)\*\*)\*[^*\s](?:[^*]*[^*\s])?\*/` in addition to `/\*[^*\s](?:[^*]*[^*\s])?\*/`, but it doesn't work. – Vid Jul 16 '21 at 18:23
  • What other markdown features are you trying to add? Based on your programming language there is very likely a readily available parser library. – MonkeyZeus Jul 16 '21 at 19:11
  • You could make use of backreferences like this `([*]{1,2})[^*\s](?:[^*]*[^*\s])?\1` for detecting bold or italics but would need to use callbacks to see **which** formatting rule matched. https://regex101.com/r/AfrmoC/1 – MonkeyZeus Jul 16 '21 at 19:14
  • I asked this question just the other day for doing markdown URLs: https://stackoverflow.com/questions/68394029/regex-for-url-markdown – MonkeyZeus Jul 16 '21 at 19:16

2 Answers2

3

Use

\*(?![*\s])(?:[^*]*[^*\s])?\*

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  \*                       '*'
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    [*\s]                    any character of: '*', whitespace (\n,
                             \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    [^*]*                    any character except: '*' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    [^*\s]                   any character except: '*', whitespace
                             (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  )?                       end of grouping
--------------------------------------------------------------------------------
  \*                       '*'
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
  • 2
    Thank you very much for the explanation, it works great. I also made an addition - added `(?<!\*)` in front of it so the pattern doesn't match if we have two asterisks at the start: `(?<!\*)\*(?![*\s])(?:[^*]*[^*\s])?\*`, which is useful, because I want to match markdown for bold, which is `**string**` – Vid Jul 17 '21 at 08:04
2
/(?<!\*)\*(?![*\s])(?:[^*]*[^*\s])?\*(?!\*)/gim

To match only italic text

**Not matches this**
*Not this one also**
**Not again*
*It's will match*

Thanks to Ryszard Czech and Vid

Sachin Chillal
  • 393
  • 4
  • 6