Javascript: Regular Expression to parse a formula

Question

I have been working on a function to parse a formula for some time, but haven't been able to make it work properly. It seems to not always work - it filters some parts of the text but not all.

parseFormula(e) {
    var formula = e.value, value = 0.00, tValue = 0.00, tFormula = '', dObj = {};
    if(formula !== undefined && formula !== "") {
        dObj._formulaIn = formula;
        var f = formula.split(/\s/g);    

        for(var i = 0; i < f.length; i++) {
            tFormula = f[i];
            // Replacing PI
            tFormula = tFormula.replace(/(pi)/gi,Math.PI);
            dObj._1_pi_done = tFormula;

            // Replacing Squareroot with placeholder
            tFormula = tFormula.replace(/(sqrt)/gi,"__sqrt__");
            tFormula = tFormula.replace(/(sqr)/gi,"__sqrt__");
            tFormula = tFormula.replace(/(kvrt)/gi,"__sqrt__");
            tFormula = tFormula.replace(/(kvr)/gi,"__sqrt__");
            dObj._2_sqrt_done = tFormula;

            // Removing units that may cause trouble
            tFormula = tFormula.replace(/(m2||m3||t2||t3||c2||c3)/gi,"");
            dObj._3_units_done = tFormula;

            // Removing text
            tFormula = tFormula.replace(/\D+[^\*\/\+\-]+[^\,\.]/gi,"");
            dObj._4_text_done = tFormula;

            // Removing language specific decimals
            if(Language.defaultLang === "no_NB") {
                tFormula = tFormula.replace(/(\.)/gi,"");
                tFormula = tFormula.replace(/(\,)/gi,".");
            } else {
                tFormula = tFormula.replace(/(\,)/gi,"");           
            }
            dObj._5_lang_done = tFormula;

            // Re-applying Squareroot
            tFormula = tFormula.replace(/(__sqrt__)/g,"Math.sqrt");
            dObj._6_sqrt_done = tFormula;

            if(tFormula === "") {
                f.splice(i,1);
            } else {
                f[i] = tFormula;
            }
            dObj._7_splice_done = tFormula;
            console.log(dObj);
        }

        formula = "";
        for(var j = 0; j < f.length; j++) {
            formula += f[j];   
        }

        try {
            value = eval(formula);
        } 
        catch(err) {}
        return value === 0 ? 0 : value.toFixed(4);
    } else {
        return 0;
    }
}

I am not sure about any of the RegEx used in this function, hence why I am asking for help. For example, I am not sure if /(pi)/ is the right way to get the exact text "pi" and replace it with 3.141.

(I am using eval at the moment, but it's merely used for development)

Any help appreciated.

Edit:

The Formula I am trying to parse is a user input formula. Where he/she would type something like: 2/0.6 pcs of foo * pi bar + sqrt(4) foobar. Where I would want it to strip all the non-math letters and calculate the rest. Meaning the above formula would be interpreted as (2/0.6) * 3.141 + Math.sqrt(4) => 12.47

Edit 2:

e is a ExtJS object, passed through by a field in a grid, it contains the following variables:

colIdx (int)
column (Ext.grid.column.Column)
field (string)
grid (Ext.grid.Panel)
originalValue (string)
record (Ext.data.Model)
row (css selector)
rowIdx (int)
store (Ext.data.Store)
value (string)
view (Ext.grid.View)

Am currently unable to get the JSFiddle to work properly.

I am wondering what I'm doing wrong in my RegExs - if there is any obvious mistakes. — GauteR, Aug 27 '13 at 07:49
It's a function I use in ExtJS 4.2.x - for evaluating a field in the grid. — GauteR, Aug 27 '13 at 07:50
Please define what `e` could be, and create a fiddle. We need to know if a solution works or not. — Brigand, Aug 27 '13 at 07:58
A suggestion: add to your question an example `formula` you would like to parse. Without it it is very hard to validate the regexs you are trying to use. This might help everyone understand better what you want to accomplish and will get you more help. — Joum, Aug 27 '13 at 07:59
Also, I don't want to put you down from your attempts at all, but it seems that this might become very helpful: http://zaach.github.io/jison/demos/calc/ — Joum, Aug 27 '13 at 08:05
@FakeRainBrigand e is a ExtJS object, passed through by a grid, it contains the following variables: colIdx (int), column (Ext.grid.column.Column), field (string), grid (Ext.grid.Panel), originalValue (string), record (Ext.data.Model), row (css selector), rowIdx (int), store (Ext.data.Store), value (string), view (Ext.grid.View) — GauteR, Aug 27 '13 at 09:10

Bart · Accepted Answer · 2013-08-27T18:12:20.250

It's probably easier to tokenize the expression you want to parse. When tokenized it's way easier to read that stream of tokens and build your own expressions.

I've put up a demo on jsFiddle which can parse your given formula

In the demo I used this Tokenizer class and tokens to build a TokenStream from the formula.

function Tokenizer() {
    this.tokens = {};
    // The regular expression which matches a token per group.
    this.regex = null;
    // Holds the names of the tokens. Index matches group. See buildExpression()
    this.tokenNames = [];
}

Tokenizer.prototype = {
    addToken: function(name, expression) {
        this.tokens[name] = expression;
    },

    tokenize: function(data) {
        this.buildExpression(data);
        var tokens = this.findTokens(data);
        return new TokenStream(tokens);
    },

    buildExpression: function (data) {
        var tokenRegex = [];
        for (var tokenName in this.tokens) {
            this.tokenNames.push(tokenName);
            tokenRegex.push('('+this.tokens[tokenName]+')');
        }

        this.regex = new RegExp(tokenRegex.join('|'), 'g');
    },

    findTokens: function(data) {
        var tokens = [];
        var match;

        while ((match = this.regex.exec(data)) !== null) {
            if (match == undefined) {
                continue;
            }

            for (var group = 1; group < match.length; group++) {
                if (!match[group]) continue;

                tokens.push({
                    name: this.tokenNames[group - 1],
                    data: match[group]
                });
            }
        }

        return tokens;
    }
}


TokenStream = function (tokens) {
    this.cursor = 0;
    this.tokens = tokens;
}
TokenStream.prototype = {
    next: function () {
        return this.tokens[this.cursor++];
    },
    peek: function (direction) {
        if (direction === undefined) {
            direction = 0;
        }

        return this.tokens[this.cursor + direction];
    }
}

Defined tokens

tokenizer.addToken('whitespace', '\\s+');
tokenizer.addToken('l_paren', '\\(');
tokenizer.addToken('r_paren', '\\)');
tokenizer.addToken('float', '[0-9]+\\.[0-9]+');
tokenizer.addToken('int', '[0-9]+');
tokenizer.addToken('div', '\\/');
tokenizer.addToken('mul', '\\*');
tokenizer.addToken('add', '\\+');
tokenizer.addToken('constant', 'pi|PI');
tokenizer.addToken('id', '[a-zA-Z_][a-zA-Z0-9_]*');

With the above tokens defined the tokenizer can recognize everything in your formula. When the formula

2/0.6 pcs of foo * pi bar + sqrt(4) foobar

is tokenized the result would be a token stream similar to

int(2), div(/), float(0.6), whitespace( ), id(pcs), whitespace( ), id(of), whitespace( ), id(foo), whitespace( ), mul(*), whitespace( ), constant(pi), whitespace( ), id(bar), whitespace( ), add(+), whitespace( ), id(sqrt), l_paren((), int(4), r_paren()), whitespace( ), id(foobar)

@AriPorad Thank you for the suggestion. It's part of a personal project I'm working on so maybe in the future it will :-) — Bart, Sep 01 '13 at 07:42
This is fantastic, implemented this on an advanced search field instead of a single huge regular expression that didn't work properly, this worked a charm! — Martin-Brennan, Feb 12 '15 at 06:45

Nikola Dimitroff · Answer 2 · 2013-08-27T09:50:04.287

1

You cannot really use a regular expression to match a formula. Formulae are a context-free language and regular expressions are limited to regular languages, the latter being a subset of the former. There are a number of algorithms for recognizing context-free languages such as CYK and LL parsers. I don't recommend studying those if you already haven't since the topic is quite large.

What you can do quickly, efficiently and easy though, is to attempt to calculate the formula using Reverse Polish Notation (RPN) (use the Shunting Yard algorithm to convert your formula to RPN). If the attempt fails (due to parenthesis not maching, invalid functions / constants, w/e), clearly the text is not a formula, otherwise all is good. Shunting yard is not a particularly difficult algorithm and you should have no trouble implementing it. Even if you do, the wikipedia page I linked above has pseudo code and there a good number of questions in SO as well to help you.

edited Aug 27 '13 at 09:50

answered Aug 27 '13 at 09:40

Nikola Dimitroff

6,127
2
25
31

Thanks for answering, but I am not trying to use RegEx to recognize context-free language, but to use pre-defined calls to parse the formula. My original idea was to replace the call for SQRT with an ASCII code which I would replace with a Math.sqrt after stripping off the unused characters. I also intend to use the same function for other types of functions, such as IF, LOG, COS, SIN, ARCSIN, etc. – GauteR Aug 27 '13 at 10:08
Which is why I suggested using Shunting yard. You can use whatever functions and constants you want to. Besides, it is a widely known algorithm so implementing it will be much less error-prone (and maintainable, imaging having to grasp your code a year from now) than your current attempt which is based on replacing certain magic strings in different parts of the function. – Nikola Dimitroff Aug 27 '13 at 16:15

Javascript: Regular Expression to parse a formula

2 Answers2

Linked