-2

Can someone make the awk line below work in awk please. The syntax uses the standard PCRE regex standard (and i need to expect that some non-numeric characters are preceded to the first number, that is the string can look like "++3.59 ± 0.04* "). (note that I tried [0-9] and [:digit:] instead of \d) also note that I did read https://www.gnu.org/software/gawk/manual/gawk.html#Regexp

gawk 'BEGIN{test="3.59 ± 0.04";match(test, /^.*?(\d+?\.\d+?)\s*?±\s*?(\d+?\.\d+?)$/, arr);print arr[1];}'
atapaka
  • 1,172
  • 4
  • 14
  • 30

1 Answers1

0

you add to many ? and I think you need to use [0-9] Also when you start with ^[^0-9] only non-numeric characters are "eaten away". so in summary I think you want:

gawk 'BEGIN{test="3.59 ± 0.04";match(test, /^[^0-9.]*([0-9]+\.[0-9]+)\s*±\s*([0-9]+\.[0-9]+)$/, arr);print arr[1];}

That matches any non-numeric character up to the first numeric one and then captures the first number as group 1.

Thanks @Ed Morton for the corrections. I did miss the + after the first digit in the original regex.

Lutz
  • 612
  • 3
  • 8
  • 2
    The meaning of `*?`, `+?`, and `??` in PCRE is that it switches from greedy to stingy matching. This cannot easily be reimplemented with traditional always-greedy regex like Awk's, though it's not clear how exactly the OP thinks this construct is useful here. – tripleee Feb 17 '20 at 19:14