0

I'm trying to write a regex that will strip out $ and , from a value and not match at all if there are any other non-numerics.

$100 -> 100
$12,203.00 -> 12203.00
12JAN2022 -> no match

I have gotten sort of close with this:

^(?:[$,]*)(([0-9.]{1,3})(?:[,.]?))+(?:[$,]*)$

However this doesn't properly capture the numeric value with $1 as the repeating digits are captured as like subgroup captures as you can see here https://regex101.com/r/4bOJtB/1

Dan
  • 304
  • 2
  • 10
  • Do you really want to _replace and match_ with _a single regex_? Or, maybe what you want is `decimal.Parse("$12,345,678.90", System.Globalization.NumberStyles.Currency)`? – SGKoishi Apr 26 '22 at 06:49
  • What is your current code? – Wiktor Stribiżew Apr 29 '22 at 18:19
  • I did want to match, then replace with a single regex. The decimal parse is somewhat helpful, but I was really trying to figure out if anyone knew how a nested capture group could be collected as one. I really want to pull the results of the middle group that contains the [0-9] and non-captures the [,.], but I'm not sure it's possible. – Dan May 02 '22 at 22:02

2 Answers2

1

You can use a named capturing group to capture all parts of the number and then concatenate them. Although, it is more straight-forward to replace all chars you do not need as a post-processing step.

Here is an example code:

var pattern = @"^\$*(?:(?<v>\d{1,3})(?:,(?<v>\d{3}))*|(?<v>\d+))(?<v>\.\d+)?$";
var tests = new[] {"$100", "$12,203.00", "12JAN2022"};
foreach (var test in tests) {
    var result = string.Concat(Regex.Match(test, pattern)?
            .Groups["v"].Captures.Cast<Capture>().Select(x => x.Value));
    Console.WriteLine("{0} -> {1}", test, result.Length > 0 ? result : "No match");
}

See the C# demo. Output:

$100 -> 100
$12,203.00 -> 12203.00
12JAN2022 -> No match

The regex is

^\$*(?:(?<v>\d{1,3})(?:,(?<v>\d{3}))*|(?<v>\d+))(?<v>\.\d+)?$

See the regex demo. Details:

  • ^ - start of string
  • \$* - zero or more dollar symbols
  • (?:(?<v>\d{1,3})(?:,(?<v>\d{3}))*|(?<v>\d+)) - either one to three digits (captured into Group "v") and then zero or more occurrences of a comma and then three digits (captured into Group "v"), or one or more digits (captured into Group "v")
  • (?<v>\.\d+)? - an optional occurrence of . and one or more digits (all captured into Group "v")
  • $ - end of string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • This is close, I was just hoping there was a way that ${v} would be everything but it is still multiple submatches v.1, v.2 etc. – Dan May 09 '22 at 14:50
  • 1
    @Dan You cannot capture/match(=grab into a single group value) *non-adjoining*, non-neighboring string parts. It is not how **any** regex engine works. – Wiktor Stribiżew May 09 '22 at 14:54
0

I don't know how to achieve this in single regexp, but personal opinion here I find dividing the problem into smaller steps a good idea - it's easier to implement and maintain/understand in the future without sacrificing time to understand the magic.

  1. replace all $ and , to empty string [\$\,] => ``

  2. match only digits and periods as a capture group (of course you may need to align this with your requirements on allowed period locations etc.) ^((\d{1,3}\.?)+)$

Hope this helps!

roten
  • 196
  • 13