3

I'm trying to use regex to convert Slack's version of markdown formatting to BB Code. I'm stuck on links at the moment. Slack formats like this:

<www.url.com|This is the actual text>
<www.url.com>

BB Code formats like this:

[url=www.url.com]This is the actual text[/url]
[url]www.url.com[/url]

I'm dealing with the first type using this (in javascript)

string.replace(/\<([\s\S]+)(?=\|)\|([\s\S]*?)\>/gm, "[url=$1]$2[/url]"

I'm struggling to make a second rule that will only match text between <...> if there isn't a | in the string. Can anyone help me out?

Also if there's a neat way of dealing with both options in one go then let me know!

ChrisGPT was on strike
  • 127,765
  • 105
  • 273
  • 257
David
  • 169
  • 3
  • 13

1 Answers1

2

You can use

const text = `<www.url.com|This is the actual text>
<www.url.com>`;
console.log( text.replace(/<([^<>|]*)(?:\|([^<>]*))?>/g, (x, url, text) => text !== undefined ?
 `[url=${url}]${text}[/url]` : `[url]${url}[/url]`) )

See the regex demo. Details:

  • < - a < char (please NEVER escape this char in any regex flavor if you plan to match a < char)
  • ([^<>|]*) - Group 1: any zero or more chars other than <, > and |
  • (?:\|([^<>]*))? - an optional non-capturing group matching one or zero occurrences of a | and then any zero or more chars other than < and > captured into Group 2
  • > - a > char (again, please never escape the > literal char in any regex flavor).
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Could you give more info on why <> should not be escaped? – anotherGatsby Mar 29 '22 at 09:54
  • 1
    @anotherGatsby Because it is not a regex special metacharacter and in some regex flavors, `\<` / `\>` are word boundaries. And still a great amount of users think that "regex is universal" and will use the same pattern across languages/environments. – Wiktor Stribiżew Mar 29 '22 at 09:55
  • I knew that they are not special characters but did not know that they were word boundaries in some flavor. Thanks for the info. – anotherGatsby Mar 29 '22 at 09:58
  • 1
    Thanks for this super fast response! I'm just getting my head around your notes to make sure I understand it. – David Mar 29 '22 at 13:06
  • @WiktorStribiżew Am I right in thinking that this will fail if the URL has any of the non-captured characters in it? (<, > and |)? Hopefully this isn't a common issue but perhaps it could be a problem sometimes. Any ideas? – David Mar 29 '22 at 19:03
  • @David You cannot have `<` and `>` in a tag. It will be a corrupt code then, and that means no regex can match it. `|` is not a problem, it is optional. – Wiktor Stribiżew Mar 29 '22 at 19:15
  • @David Do you have any test cases that fail? Please update, let know if the current solution is good enough. – Wiktor Stribiżew May 31 '22 at 07:35