31

I'm not quite sure I understand how non-capturing groups work. I am looking for a regex to produce this result: 5.214. I thought the regex below would work, but it is replacing everything including the non-capture groups. How can I write a regex to only replace the capture groups?

"5,214".gsub(/(?:\d)(,)(?:\d)/, '.')
# => ".14"

My desired result:

"5,214".gsub(some_regex)
#=> "5.214
diasks2
  • 2,033
  • 2
  • 36
  • 61
  • 2
    If this is the case, why not just replace the comma? – hwnd Oct 20 '14 at 01:28
  • I might have a longer string where commas are used in a different context. ex: `I went to the store, and then I bought something for $1,200.` I only want to replace commas that match the following: `\d,\d` – diasks2 Oct 20 '14 at 01:33
  • 1
    If you have decimal points as well in the string, that would be mixed up. If you want to change the locale to a language that uses `.` for thousands, then you are asking an XY-problem. Using a locale library would probably give you a more robust solution than trying it from scratch. – sawa Oct 20 '14 at 01:37

7 Answers7

32

non capturing groups still consumes the match
use
"5,214".gsub(/(\d+)(,)(\d+)/, '\1.\3')
or
"5,214".gsub(/(?<=\d+)(,)(?=\d+)/, '.')

alpha bravo
  • 7,838
  • 1
  • 19
  • 23
29

You can't. gsub replaces the entire match; it does not do anything with the captured groups. It will not make any difference whether the groups are captured or not.

In order to achieve the result, you need to use lookbehind and lookahead.

"5,214".gsub(/(?<=\d),(?=\d)/, '.')
sawa
  • 165,429
  • 45
  • 277
  • 381
  • 7
    I personally find [alpha bravo's answer](https://stackoverflow.com/users/2716792/alpha-bravo). IMO it is much easier to implement for non trivial regex. `"5,214".gsub(/(\d+)(,)(\d+)/, '\1.\3')`. What it does is: define three groups: (before)(what you need to replace)(after) and now replace with (before)(your replacement)(after). – nacho4d Nov 17 '17 at 05:00
23

It is also possible to use Regexp.last_match (also available via $~) in the block version to get access to the full MatchData:

"5,214".gsub(/(\d),(\d)/) { |_|
    match = Regexp.last_match

    "#{match[1]}.#{match[2]}"
}

This scales better to more involved use-cases.

Nota bene, from the Ruby docs:

the ::last_match is local to the thread and method scope of the method that did the pattern match.

Raphael
  • 9,779
  • 5
  • 63
  • 94
8

gsub replaces the entire match the regular expression engine produces. Both capturing/non-capturing group constructs are not retained. However, you could use lookaround assertions which do not "consume" any characters on the string.

"5,214".gsub(/\d\K,(?=\d)/, '.')

Explanation: The \K escape sequence resets the starting point of the reported match and any previously consumed characters are no longer included. That being said, we then look for and match the comma, and the Positive Lookahead asserts that a digit follows.

hwnd
  • 69,796
  • 4
  • 95
  • 132
  • 2
    Your [Explain Regular Expressions tool](http://liveforfaith.com/re/explain.pl) listed in your profile is very useful and cool. Thanks for creating that. – diasks2 Oct 20 '14 at 01:47
1

I know nothing about ruby.

But from what i see in the tutorial

gsub mean replace, the pattern should be /(?<=\d+),(?=\d+)/ just replace the comma with dot or, use capture /(\d+),(\d+)/ replace the string with "\1.\2"?

H2Ojile
  • 37
  • 3
1

You can easily reference capture groups in the replacement string (second argument) like so:

"5,214".gsub(/(\d+)(,)(\d+)/, '\1.\3')
#=> "5.214"
  • \0 will return the whole matched string.
  • \1 will be replaced by the first capturing group.
  • \2 will be replaced by the second capturing group etc.

You could rewrite the example above using a non-capturing group for the , char.

"5,214".gsub(/(\d+)(?:,)(\d+)/, '\1.\2')
#=> "5.214"

As you can see, the part after the comma is now the second capturing group, since we defined the middle group as non-capturing.

Although it's kind of pointless in this case. You can just omit the capturing group for , altogether

"5,214".gsub(/(\d+),(\d+)/, '\1.\2')
#=> "5.214"
0

You don't need regexp to achieve what you need:

'1,200.00'.tr('.','!').tr(',','.').tr('!', ',')

  • Periods become bangs (1,200!00)
  • Commas become periods (1.200!00)
  • Bangs become commas (1.200,00)
Cyber Oliveira
  • 8,178
  • 4
  • 28
  • 18