0

I'm trying to convert some math related strings containing absolute values, using Regex in Javascript. I would like to convert all occurences of |foo| to abs(foo).

How can I detect if the character is opening or closing, given that they could also be nested? Basically I would like to convert all occurrences of opening | to abs( and all closing | to ). Whatever is between the vertical bars is unchanged.

Some examples of possible input and desired output:
|x|+12
abs(x)+12

|x|+12+|x+2|
abs(x)+12+abs(x+2)

|x|+|x+|z||
abs(x)+abs(x+abs(z))

Any ideas?

Appe
  • 3
  • 2
  • The input language you're looking at is non-regular, so it cannot be (correctly) parsed by regular expressions. – Siguza Mar 10 '21 at 02:15
  • Particularly the nesting aspect seems nasty. You can use Regex to build a lexer, but you'll need a full parser for its output... – Siguza Mar 10 '21 at 02:16
  • There are regex dialects that support nesting, not JavaScript. You could however do this in several steps: 1. tag the `|`s with nesting level (+1, -1 as you go from left to right. 2. identify start and end `|` of same level from left to right. 3. Clean up the tagged `|`. See https://twiki.org/cgi-bin/view/Blog/BlogEntry201109x3 – Peter Thoeny Mar 10 '21 at 02:24

1 Answers1

2

There are regex dialects that support nesting, JavaScript is not one of them. You can however do this in steps:

  1. tag the |s with nesting level (+1, -1, as you go from left to right)
  2. identify start and end | of same level from left to right based on tags, and from lowest level to highest level
  3. clean up left over tags in case of unbalanced input

Functional code with test cases up to 3 levels (the code works to any level) :

function fixAbs(str) {
  const startTag = '{{s%L%}}';
  const endTag   = '{{e%L%}}';
  const absRegex = /\{\{s(\d+)\}\}(.*?)\{\{e\1\}\}/g;
  let level = 0;
  str = str
  .replace(/ /g, '')  // remove all spaces
  .replace(/(\|*)?(\w+)(\|*)?/g, function(m, c1, c2, c3) {
    // regex matches variables with all leading and trailing `|`s
    let s = c2;
    if(c1) {
      // add a start tag to each leading `|`: `{{s0}}`, `{{s1}}`, ...
      // and post-increase level
      s = '';
      for(let i = 0; i < c1.length; i++) {
        s += startTag.replace(/%L%/, level++);
      }
      s += c2;
    }
    if(c3) {
      // decrease level,
      // and add a end tag to each trailing `|`: `{{e2}}`, `{{e1}}`, ...
      for(let i = 0; i < c3.length; i++) {
        s += endTag.replace(/%L%/, --level);
      }
    }
    return s;
  });
  // find matching start and end tag from left to right,
  // repeat for each level
  while(str.match(absRegex)) {
    str = str.replace(absRegex, function(m, c1, c2, c3) {
      return 'abs(' + c2 + ')';
    });
  }
  // clean up tags in case of unbalanced input
  str = str.replace(/\{\{[se]-?\d+\}\}/g, '|'); 
  return str;
}

const testCases = [
  '|x|+12',
  '|x|+|y+|z||',
  '|x|+||y|+z|',
  '|x|+|x+|y|+z|',
  '|x|+|x+|y+|t||+z|',
  '|x|+12+|2+x|',
  '|x|+12+|x+2|'
].forEach(str => {
  let result = fixAbs(str);
  console.log('"' + str + '" ==> "' + result + '"');
});

Output:

"|x|+12" ==> "abs(x)+12"
"|x|+|y+|z||" ==> "abs(x)+abs(y+abs(z))"
"|x|+||y|+z|" ==> "abs(x)+abs(abs(y)+z)"
"|x|+|x+|y|+z|" ==> "abs(x)+abs(x+abs(y)+z)"
"|x|+|x+|y+|t||+z|" ==> "abs(x)+abs(x+abs(y+abs(t))+z)"
"|x|+12+|2+x|" ==> "abs(x)+12+abs(2+x)"
"|x|+12+|x+2|" ==> "abs(x)+12+abs(x+2)"

Code is annotated with comments for clarity.

This is based on a TWiki blog at https://twiki.org/cgi-bin/view/Blog/BlogEntry201109x3

Peter Thoeny
  • 7,379
  • 1
  • 10
  • 20
  • Wow, very nice! Thank you for taking the time @PeterThoney. It works very well, I did find one case that didn't pass though, "|x|+12+|2+x|" or "|x|+12+|x+2|". I will look closer at the function, maybe I figure it out, I'll let you know. – Appe Mar 10 '21 at 17:18