0

I am looking at codes in odf formulas that look a bit like this: {500mgl} over {4.05grams} Example

I want to use a regex with gsub in R to enclose in brackets all of the elements with the pattern

([0-9]+)([A-Za-z]+)

to avoid some units not displaying in the denominator. However, if I do this, the respective units will end up separated from the real number: 4,{0.5g} So what I want to enclose first the numbers with the commas:

a<-"4,05g"
gsub("([0-9]+)(\\,)([0-9]+)([A-Za-z]+)","{\\1\\2\\3\\4}",a)

and then, enclose with brackets the pattern:

([0-9]+)([A-Za-z]+)

but only if there is not an opening bracket before the pattern. I've tried searching the web for how look back syntax works with regex, however, I get pretty confused with how it works within R's gsub. I tried things like this:

gsub("([^\\.])([0-9]+)([A-Za-z]+)","{\\2\\3}",a)
gsub("(?[\\.])([0-9]+)([A-Za-z]+)","{\\2\\3}",a)
gsub("(!\\.?)([0-9]+)([A-Za-z]+)","{\\2\\3}",a)

but honestly I have no idea what I'm doing.

EDIT: I think that the exemption for the previous character must be not a bracket but a comma. That way one would avoid the output

"0,3g
" 0,{3g}"

but be able to do

"30g"
"{30g}"
Mata
  • 538
  • 3
  • 17
GEX_HEX_420
  • 105
  • 6
  • Great effort verbally describing what you want. However, having read it a few times I'm not sure what the expected output looks like. Could you add a specific input and expected output? Is it from `"(500mg"/"L* 10.00) over 4,05"` to `"(500mg)/(L*10.00) over (4,05)"`? – Donald Seinen Nov 30 '21 at 07:52
  • it's like in the picture, in the third line where the code is. I want to enclose in brackets whole numbers with units {12g} and enclose numbers with decimals {12,4g} the problem is that enclosing whole numbers without the exception of the previous bracket, would leave the number out of its decimal part {4,{2g}} it would be like in your example but with brackets instead of parenthesis. however, if I select only patterns with commas, I won't enclose whole numbers with it's respective unit – GEX_HEX_420 Nov 30 '21 at 07:55
  • 1
    I am not sure if this is what your are looking for, but have you tried to make the comma optional? You can use `,?` to do that. The whole regex then would look like this: `a<-c("4,05g", "50mg", "120,32mg"); gsub("(\\d+,?\\d+\\w+)", "{\\1}", a)`. – Cettt Nov 30 '21 at 08:25
  • yes @Cett That I think works for my intentions. Hadn't thought of it as a whole pattern. G ood idea. I'll try it tomorrow on the PC when I get to work – GEX_HEX_420 Nov 30 '21 at 08:40
  • 1
    See https://ideone.com/8AOgQM. ``\d+,?\d+\w+`` is a wrong pattern here as it will not let you match single digit numbers. – Wiktor Stribiżew Nov 30 '21 at 08:42
  • @WiktorStribiżew thank you. Of course you are absolutely right. – Cettt Nov 30 '21 at 09:11

1 Answers1

0

You can use

x <- "4,05g"
gsub("(\\d+(?:,\\d+)?[[:alpha:]]*)", "{\\1}", x)

See the R demo and the regex demo.

Details:

  • ( - Group 1 start (necessary as gsub does not support backreferences to the whole match):
    • \d+ - one or more digits
    • (?:,\d+)? - an optional sequence of a comma and one or more digits
    • [[:alpha:]]* - zero or more letters
  • ) - end of the group.

The \1 in the replacement is the value of Group 1.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563