How do I use regex to convert Slack URLs to BB Code?

Question

I'm trying to use regex to convert Slack's version of markdown formatting to BB Code. I'm stuck on links at the moment. Slack formats like this:

<www.url.com|This is the actual text>
<www.url.com>

BB Code formats like this:

[url=www.url.com]This is the actual text[/url]
[url]www.url.com[/url]

I'm dealing with the first type using this (in javascript)

string.replace(/\<([\s\S]+)(?=\|)\|([\s\S]*?)\>/gm, "[url=$1]$2[/url]"

I'm struggling to make a second rule that will only match text between <...> if there isn't a | in the string. Can anyone help me out?

Also if there's a neat way of dealing with both options in one go then let me know!

That's not Markdown. Please only use the [tag:markdown] tag for questions about Markdown. — ChrisGPT was on strike, Mar 29 '22 at 12:09
Apologies, Slack calls it their version of Markdown but I agree, it is very different! — David, Mar 29 '22 at 13:05

score 2 · Answer 1 · answered Mar 29 '22 at 09:48

2

You can use

const text = `<www.url.com|This is the actual text>
<www.url.com>`;
console.log( text.replace(/<([^<>|]*)(?:\|([^<>]*))?>/g, (x, url, text) => text !== undefined ?
 `[url=${url}]${text}[/url]` : `[url]${url}[/url]`) )

See the regex demo. Details:

< - a < char (please NEVER escape this char in any regex flavor if you plan to match a < char)
([^<>|]*) - Group 1: any zero or more chars other than <, > and |
(?:\|([^<>]*))? - an optional non-capturing group matching one or zero occurrences of a | and then any zero or more chars other than < and > captured into Group 2
> - a > char (again, please never escape the > literal char in any regex flavor).

answered Mar 29 '22 at 09:48

Wiktor Stribiżew

607,720
39
448
563

Could you give more info on why <> should not be escaped? – anotherGatsby Mar 29 '22 at 09:54
1

@anotherGatsby Because it is not a regex special metacharacter and in some regex flavors, `\<` / `\>` are word boundaries. And still a great amount of users think that "regex is universal" and will use the same pattern across languages/environments. – Wiktor Stribiżew Mar 29 '22 at 09:55
I knew that they are not special characters but did not know that they were word boundaries in some flavor. Thanks for the info. – anotherGatsby Mar 29 '22 at 09:58
1

Thanks for this super fast response! I'm just getting my head around your notes to make sure I understand it. – David Mar 29 '22 at 13:06
@WiktorStribiżew Am I right in thinking that this will fail if the URL has any of the non-captured characters in it? (<, > and |)? Hopefully this isn't a common issue but perhaps it could be a problem sometimes. Any ideas? – David Mar 29 '22 at 19:03
@David You cannot have `<` and `>` in a tag. It will be a corrupt code then, and that means no regex can match it. `|` is not a problem, it is optional. – Wiktor Stribiżew Mar 29 '22 at 19:15
@David Do you have any test cases that fail? Please update, let know if the current solution is good enough. – Wiktor Stribiżew May 31 '22 at 07:35

How do I use regex to convert Slack URLs to BB Code?

1 Answers1