38

Ok so I am trying to group past the 9th backreference in notepad++. The wiki says that I can use group naming to go past the 9th reference. However, I can't seem to get the syntax right to do the match. I am starting off with just two groups to make it simple.

Sample Data

1000,1000

Regex.

(?'a'[0-9]*),([0-9]*)

According to the docs I need to do the following.

(?<some name>...), (?'some name'...),(?(some name)...)
Names this group some name.

However, the result is that it can't find my text. Any suggestions?

Steven Combs
  • 1,890
  • 6
  • 29
  • 54
  • 4
    ouch...9 back-references? Are you sure you aren't maybe over-complicating something? – CrayonViolent Jun 06 '12 at 02:31
  • 1
    Not at all, I am restoring database data, and using notepad++ to format the insert statements. – Steven Combs Jun 06 '12 at 02:33
  • In that case, why not just generate the insert statements via a scripting language? –  Jun 06 '12 at 02:36
  • 1
    I guess I could, but I have visited this prior with Notepad++ and tried to get past the 9th reference and couldn't do it. So now I am just trying to make it happen. – Steven Combs Jun 06 '12 at 02:38

4 Answers4

39

You can simply reference groups > 9 in the same way as those < 10

i.e $10 is the tenth group.

For (naive) example:

String:

abcdefghijklmnopqrstuvwxyz

Regex find:

(?:a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)(m)(n)(o)(p)

Replace:

$10

Result:

kqrstuvwxyz

My test was performed in Notepad++ v6.1.2 and gave the result I expected.

Update: This still works as of v7.5.6


SarcasticSully resurrected this to ask the question:

"What if you want to replace with the 1st group followed by the character '0'?"

To do this change the replace to:

$1\x30

Which is replacing with group 1 and the hex character 30 - which is a 0 in ascii.

BunjiquoBianco
  • 1,994
  • 2
  • 21
  • 24
4

A very belated answer to help others who land here from Google (as I did). Named backreferences in notepad++ substitutions look like this: $+{name}. For whatever reason.

There's a deviation from standard regex gotcha here, though... named backreferences are also given numbers. In standard regex, if you have (.*)(?<name> & )(.*), you'd replace with $1${name}$2 to get the exact same line you started with. In notepad++, you would have to use $1$+{name}$3.


Example: I needed to clean up a Visual Studio .sln file for mismatched configurations. The text I needed to replace looked like this:

    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.Dev|Any CPU.ActiveCfg = Debug|Any CPU
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.Dev|Any CPU.Build.0 = Debug|Any CPU
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.Dev|x64.ActiveCfg = Debug|Any CPU
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.Dev|x64.Build.0 = Debug|Any CPU
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.Dev|x86.ActiveCfg = Debug|Any CPU
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.Dev|x86.Build.0 = Debug|Any CPU
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.QA|Any CPU.ActiveCfg = Release|Any CPU
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.QA|Any CPU.Build.0 = Release|Any CPU
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.QA|x64.ActiveCfg = Release|Any CPU
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.QA|x64.Build.0 = Release|Any CPU
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.QA|x86.ActiveCfg = Release|Any CPU
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.QA|x86.Build.0 = Release|Any CPU


My search RegEx:

  ^(\s*\{[^}]*\}\.)(?<config>[a-zA-Z0-9]+\|[a-zA-Z0-9 ]+)*(\..+=\s*)(.*)$

My replacement RegEx:

  $1$+{config}$3$+{config}

The result:

    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.Dev|Any CPU.ActiveCfg = Dev|Any CPU
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.Dev|Any CPU.Build.0 = Dev|Any CPU
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.Dev|x64.ActiveCfg = Dev|x64
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.Dev|x64.Build.0 = Dev|x64
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.Dev|x86.ActiveCfg = Dev|x86
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.Dev|x86.Build.0 = Dev|x86
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.QA|Any CPU.ActiveCfg = QA|Any CPU
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.QA|Any CPU.Build.0 = QA|Any CPU
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.QA|x64.ActiveCfg = QA|x64
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.QA|x64.Build.0 = QA|x64
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.QA|x86.ActiveCfg = QA|x86
    {CDDB12FE-885F-4FB7-9724-1A4279573DE5}.QA|x86.Build.0 = QA|x86

Hope this helps someone.

James King
  • 6,233
  • 5
  • 42
  • 63
2

OK, matching is no problem, your example matches for me in the current Notepad++. This is an important point. To use PCRE regex in Notepad++, you need a Version >= 6.0.

The other point is, where do you want to use the backreference? I can use named backreferences without problems within the regex, but not in the replacement string.

means

(?'a'[0-9]*),([0-9]*),\g{a}

will match

1000,1001,1000

But I don't know a way to use named groups or groups > 9 in the replacement string.

Do you really need more than 9 backreferences in the replacement string? If you just need more than 9 groups, but not all of them in the replacement, then make the groups you don't need to reuse non-capturing groups, by adding a ?: at the start of the group.

(?:[0-9]*),([0-9]*),(?:[0-9]*),([0-9]*)
           group 1             group 2
stema
  • 90,351
  • 20
  • 107
  • 135
  • I think I could have made this work also. This is a great answer as well. When I saw yours, I started thinking about group nesting which would have also worked in this scenario. – Steven Combs Jun 06 '12 at 20:29
2

The usual syntax of referencing groups with \x will interpret \10 as a reference to group 1 followed by a 0.
You need to use instead the alternative syntax of $x with $10.
The question here is for Notepad++, but if you ever have to do this in Bash, don't forget to escape the expression as \$10

Note : Some people seem to doubt there's ever any reason to have 10 groups.
I have a simple one, I wanted to rename a group of files named
<name_start>DDMMYYYY_TIME_DDMMYYYY_TIME<name_end> as <name_start>YYYYMMDD_TIME_YYYYMMDD_TIME<name_end>,
and ended with replacing my input matches with :
rename "\1" "\2\5\4\3_\6_\9\8\7_$10" since name_start and name_end were not always constant.

jmd
  • 877
  • 8
  • 12