How to replace only first n matching instances in TCL regexp?

Question

I need to replace the first 50 abc as bcd. I have tried the below, but it is not working.

set a "1 abc 2 abc 3 abc 4 abc......... 100 abc"
regsub -all "(.*?(abc).*)(50)" $a "bcd \1" b
puts $b

The numbers in the string are for demonstration purpose. The string can be arbitrary:

set a "hh abc cc abc hh abc cc abc dd abc hh abc......... hh abc"

@strib: Not a duplicate. The solution over there is less than ideal. Let's leave this question open. — nhahtdh, May 26 '15 at 06:59
[TCL regsub](https://www.tcl.tk/man/tcl8.4/TclCmd/regsub.htm) does not allow replacing n occurrences. Only 1 or `-ALL`, that is why regex cannot be used here. — Wiktor Stribiżew, May 26 '15 at 08:47

Jerry · Answer 1 · 2015-05-26T07:49:55.400

You can use this custom proc that uses a function for replace:

set a "1 abc 2 abc 3 abc 4 abc......... 100 abc"

proc rangeSub {a first last string sub} {
  # This variable keeps the count of matches
  set count 0
  proc re_sub {str first last rep} {
    upvar count count
    incr count
    # If match number within wanted range, replace with rep, else return original string
    if {$count >= $first && $count <= $last} {
      return $rep
    } else {
      return $str
    }
  }

  set cmd {[re_sub "\0" $first $last $sub]}
  set b [subst [regsub -all "\\y$string\\y" $a $cmd]]

  return $b
}

# Here replacing the 1st to 3rd occurrences of abc
puts [rangeSub $a 1 3 "abc" "bcd"]
# => 1 bcd 2 bcd 3 bcd 4 abc......... 100 abc
puts [rangeSub $a 2 3 "abc" "bcd"]
# => 1 abc 2 bcd 3 bcd 4 abc......... 100 abc

Change the call to rangeSub $a 1 50 "abc" "bcd" to replace the first 50 occurrences.

codepad demo

Alternative using indices and string range:

set a "1 abc 2 abc 3 abc 4 abc......... 100 abc"

proc rangeSub {a first last string sub} {
  set idx [regexp -all -inline -indices "\\yabc\\y" $a]
  set start [lindex $idx $first-1 0]
  set end [lindex $idx $last-1 1]
  regsub -all -- "\\yabc\\y" [string range $a $start $end] bcd result
  return [string range $a 0 $start-1]$result[string range $a $end+1 end]
}

puts [rangeSub $a 1 3 abc bcd]

@Bharathi No, it is not possible. re_syntax has no way to restrict replacement based on the matches. — Jerry, May 26 '15 at 07:31

Dinesh · Answer 2 · 2015-05-26T08:59:50.733

I hope this can be done with usage of regexp and regsub together.

%
% set count 0
0
% # Don't bother about this 'for' loop. It is just for input generation
% for { set i 65} {$i < 123} {incr i} {
        if {$count == 101} break
        if { $i >= 90 && $i <=96} {
                continue
        }
        for { set j 65 } {$j < 123} {incr j} {
                if {$count == 101} break
                if { $j >= 90 && $j <=96} {
                        continue
                }
                incr count
                append input "[format %c%c $i $j] abc "


        }
}
%
% # Following the 'input' value taken for processing
% # So, concentrate only from now on wards :D
% set input
AA abc AB abc AC abc AD abc AE abc AF abc AG abc AH abc AI abc AJ abc AK abc AL abc AM abc AN abc AO abc AP abc AQ abc AR abc AS abc AT abc AU abc AV abc AW abc AX abc AY abc Aa abc Ab abc Ac abc Ad abc Ae abc Af abc Ag abc Ah abc Ai abc Aj abc Ak abc Al abc Am abc An abc Ao abc Ap abc Aq abc Ar abc As abc At abc Au abc Av abc Aw abc Ax abc Ay abc Az abc BA abc BB abc BC abc BD abc BE abc BF abc BG abc BH abc BI abc BJ abc BK abc BL abc BM abc BN abc BO abc BP abc BQ abc BR abc BS abc BT abc BU abc BV abc BW abc BX abc BY abc Ba abc Bb abc Bc abc Bd abc Be abc Bf abc Bg abc Bh abc Bi abc Bj abc Bk abc Bl abc Bm abc Bn abc Bo abc Bp abc Bq abc Br abc Bs abc Bt abc Bu abc Bv abc Bw abc Bx abc By abc
%
% regexp "(.*?abc.*?){50}" $input match; #First matching upto '50' occurence
1
% regsub -all "((.*?)abc.*?)" $match "\\2bcd" replaceText; #Replacing the 'abc' with 'bcd'
50
% set replaceText
% regsub $match $input $replaceText output; #At last, replace this content from the main input
1
% 
% set output
AA bcd AB bcd AC bcd AD bcd AE bcd AF bcd AG bcd AH bcd AI bcd AJ bcd AK bcd AL bcd AM bcd AN bcd AO bcd AP bcd AQ bcd AR bcd AS bcd AT bcd AU bcd AV bcd AW bcd AX bcd AY bcd Aa bcd Ab bcd Ac bcd Ad bcd Ae bcd Af bcd Ag bcd Ah bcd Ai bcd Aj bcd Ak bcd Al bcd Am bcd An bcd Ao bcd Ap bcd Aq bcd Ar bcd As bcd At bcd Au bcd Av bcd Aw bcd Ax bcd Ay bcd Az abc BA abc BB abc BC abc BD abc BE abc BF abc BG abc BH abc BI abc BJ abc BK abc BL abc BM abc BN abc BO abc BP abc BQ abc BR abc BS abc BT abc BU abc BV abc BW abc BX abc BY abc Ba abc Bb abc Bc abc Bd abc Be abc Bf abc Bg abc Bh abc Bi abc Bj abc Bk abc Bl abc Bm abc Bn abc Bo abc Bp abc Bq abc Br abc Bs abc Bt abc Bu abc Bv abc Bw abc Bx abc By abc

Note : I noticed that you are using \1 to represent the first capture group. But, you are using it like that inside double quotes which is wrong. If you are using inside braces, it should be fine, but when used inside double quotes, the backslash should be escaped like \\1

How to replace only first n matching instances in TCL regexp?

2 Answers2