7

I am trying to make a snippet that will take clipboard contents (the text of a heading in a markdown document) and transform it into a link to that section. For example, if my clipboard contains: Some Heading - 20191107 then I want the following to be output:

[Some Heading - 20191107](filename.md#some-heading---20191107)

Here is my snippet VS Code for markdown so far:

    "link to this section": {
        "prefix": "isection",
        "body": [
            "[${1:${CLIPBOARD}}](${TM_FILENAME}#${CLIPBOARD/ /-/g})"
        ],
        "description": "Insert link to section whose heading text is in the clipboard"
    }

This has the first transform, but what I cannot figure out is how to nest multiple transforms:

  • Replace all space with a hyphen.
  • Change all to lower case.
  • Remove any characters matching [^a-z0-9-]

Test Case

To clarify my test case for @Mark, in a markdown document in VS Code, I make a section heading such as:

# 20191107 - @#$%^& This is a section - 20191107

I then copy the text 20191107 - @#$%^& This is a section - 20191107 and run the snippet you fixed up for me. What it outputs is:

[20191107 - @#$%^& This is a section - 20191107](tips.tech.git.md#20191107----this-is-a-section---20191107)

Which is a valid link to the heading!

Robert Mark Bram
  • 8,104
  • 8
  • 52
  • 73
  • Ideally how would you like the output to look in your test case? I used `Some Heading - 20191107` from your original question to design the regex but it looks like `# 20191107 - @#$%^& This is a section - 20191107` is really what you are starting with? – Mark Nov 15 '19 at 20:23
  • That was just an example so that I can test what it will do with headings that have non-alphanumeric characters in it. It works perfectly btw - exactly what I was trying to create myself. The most important part is the way it creates the link part - within the round brackets. If you don't mind giving a bit of explanation on the regex, that would be most appreciated! – Robert Mark Bram Nov 16 '19 at 09:37

1 Answers1

12

Here is a snippet that I believe meets all requirements (I have simplified this from an earlier answer of mine).

"link to this section": {
  "prefix": "isection",
  "body": [

    "[${1:${CLIPBOARD}}](${TM_FILENAME}#${CLIPBOARD/([\\w-]+$)|([\\w-]+)|([-\\s]+)|([^\\w]+)/${1:/downcase}${2:/downcase}${2:+-}/gm})"

  ],
  "description": "Insert link to section whose heading text is in the clipboard"
}

I will explain this part:

${CLIPBOARD/([\\w-]+$)|([\\w-]+)|([-\\s]+)|([^\\w]+)/${1:/downcase}${2:/downcase}${2:+-}/gm}

The main idea here is to capture each group to be handled differently in its own group. A regex alternation will just capture one group for each match. See regex101 demo.

Then you can transform that group or ignore it without affecting any subsequent matches!

It is alternation of four capture groups:

  1. ([\\w-]+$) note the $ to indicate end of line, must be first capture group
  2. ([\\w-]+) same as group 1, but not at end of line
  3. ([-\\s]+) capture spaces and hyphens in a group
  4. ([^\\w]+) capture any characters other than A-Za-z0-9 in a group

Capture group 1 gets the last set of characters, like 12345 or asdasd.

Capture group 2 gets the same groups of characters as group 1 but not if they are at the end of the line. This is important because a - will be added if there is a capture group 2, but not if there is a capture group 1 (so no hyphen is added to the end).

Capture group 3 captures the spaces and hyphens. It will be ignored in the output.

Capture group 4 captures those non-A-Za-z0-9 characters and will be ignored.

Here is the output of the transform: ${1:/downcase}${2:/downcase}${2:+-} notice there is no mention of groups 3 or 4 - they are being discarded. But they must be matched otherwise they will pass through "un-transformed" and appear in the result - which we do not.

So lowercase groups 1 and 2, because of the alternation there will never be both in the same match.

${2:+-} if there is a group 2 add a - after it. The very last match of the entire CLIPBOARD will be a group 1 so for this last match no hyphen will be appended.

Because of the g flag the regex runs a few times, each time capturing only one of the 4 groups.


Input: Some Heading - 20191107
Output: [Some Heading - 20191107](fileName.ext#some-heading-20191107)

Input: 20191107 - @#$%^& This is a section - 20191107
Output: [20191107 - @#$%^& This is a section - 20191107](test-bed-snippets.code-snippets#20191107-this-is-a-section-20191107)


If you need more hyphens in the result, like:

[Some Heading - 20191107](filename.md#some-heading---20191107)

just take the hyphen out of the third capture group: ([\\s]+) to result in:

[20191107 - @#$%^& This is a section - 20191107](test-bed-snippets.code-snippets#20191107---this-is-a-section---20191107)

Mark
  • 143,421
  • 24
  • 428
  • 436
  • Thanks for the reply @Mark. In answer to your question the text may include any character at all. The rule for creating the anchor tag for a Markdown heading is: 1) replace all spaces with hyphen, 2) lower case everything, remove every non-alphanumeric character (or hyphen). – Robert Mark Bram Nov 07 '19 at 08:20
  • Wow, that works and make eyes cross just slightly when I try to read it. :D Would love an explanation. For a test with this text (from a markdown header): `20191107 - @#$%^& This is a section - 20191107` it generated this markdown : `[20191107 - @#$%^& This is a section - 20191107](notes.md#20191107----this-is-a-section---20191107)` which correctly linked to the heading in VS Code. – Robert Mark Bram Nov 07 '19 at 10:45
  • Not sure I quite follow your test case. `20191107 - @#$%^& This is a section - 20191107` that whole thing is a test case? And the transform works as you want on that test case? – Mark Nov 07 '19 at 23:57
  • Updated my question test to hopefully explain it! – Robert Mark Bram Nov 13 '19 at 06:10
  • It would be great if this answer could be adapted to better explain the generic part of the question: "write snippet with multiple transforms". At the moment, I have to decipher the regex myself. Hence this answer is just a working example from my perspective. – Romain Vincent Aug 22 '21 at 11:23
  • Robert and @RomainVincent I have finally gotten to explaining my answer (and simplified the transform) in the process. Let me know if it works as you expect or don't understand something. – Mark Aug 25 '21 at 02:46
  • Thanks for adding the explanations, that's very nice of you. I think I don't understand, why group 1 must be first? It would never be match otherwise maybe? – Romain Vincent Aug 27 '21 at 07:47
  • @Romain If `([\\w-]+$)` is not first then the last sequence of characters, like `20191107` in the OP's example, will be matched by `([\\w-]+)` We need them in different groups so we can use `${2:+-}` to only add the hyphen to groups that are **not** at the end. – Mark Aug 30 '21 at 21:06
  • @Romain If `([\\w-]+)` is at the beginning, it will match that last group up until the `$` and then `([\\w-]+$)` wouldn't match anything as all the preceding text before the `$` had already been matched. – Mark Aug 30 '21 at 21:09