Workaround for this calculator parsing error

Question

Context :

I entered a expression 3.24 * 10^10 + 1 into a calculator that I made. My calculator's approach to solve this is - it first looks for pattern number_a^number_b, parses the 2 numbers into double using Double.parseDouble() method, then performs Math.pow(number_a, number_b) and replaces the expression with the result.

The calculator, then, similarly looks for pattern number_a * number_b and parses it. So far our expression becomes 3.24E10 + 1. Now comes the tricky part. When I programmed this calculator I did it under consideration that calculator should find the pattern number_a + number_b and parse it. My calculator indeed does this and returns the result as, unexpectedly but justifiably - 3.24E11.0.

I am looking for workaround to make my calculator smart enough to take care of such expressions.

Important information - Regex example = ([\\d\\.]+)\\*([\\d\\.]+)

Code example -

// here 'expression' is a StringBuilder type
// only a (modified) snippet of actual code.

Matcher m = Pattern.compile ("([\\d\\.]+)\\^([\\d\\.]+)")
                           .matcher (expression.toString());
while (m.find()) {
     Double d1 = Double.parseDouble(m.group(1));
     Double d2 = Double.parseDouble(m.group(2));
     Double d3 = Math.pow(d1, d2);
     expression.replace(m.start(), m.end(), Double.toString(d3));
     m.reset(expression);
}

PS : Many people seem to think, based on how I presented the question, that my calculator is a failed attempt as regex won't take me too far. Ofcourse, I agree that is true and there may exist far better algorithms. I just want to make clear that :-

1) Regex is only used for parsing expressions in direct form. I don't use regex for everything. Nested brackets are solved using recursion. Regex only comes to play at the last step when all the processing work has been done and what remains is only simple calculation.

2) My calculator works fine. It can and does solve nested expressions gracefully. Proof - 2^3*2/4+1 --> 5.0, sin(cos(1.57) + tan(cos(1.57)) + 1.57) --> 0.9999996829318346, ((3(2log(10))+1)+1)exp(0) --> 8.0

3) Does not use too many 'crutches'. If you are of an opinion that I have written thousands of line of code to obtain the desired functionality. No. 200 lines and that's it. And I have no intention of dumping my application (which is near completion).

For those who downvote, please always tell why you felt this question as unfit for this site. It helps me to learn how to ask right questions. — Sarthak123, Jun 15 '18 at 06:53
Here is your problem: *"My calculator uses regex to solve expression"*. Regexes are the wrong tool for this. — Stephen C, Jun 15 '18 at 07:01
If you still want to use regex, your capture group should look like this: `(\\d+(\\.\\d+)?(e\\d+)?)`. And your final regex like this: `(\\d+(\\.\\d+)?(e\\d+)?)\\^(\\d+(\\.\\d+)?(e\\d+)?)` — Lino, Jun 15 '18 at 07:04
You don't need a workaround. You need a solution, and you're presently using entirely the wrong technology, as @StephenC states. Regular expressions cannot handle operator precedence. The solution is not to install a workaround and have a piece of software on a thousand crutches. It is to use the right technology. Have a look for 'recursive descent expression parser' or the Dijkstra Shunting-yard algorithm. — user207421, Jun 15 '18 at 07:05
OK, I take it as my lack of experience that regex is not a good fit for this construct but instead of mocking me for using regex I would be grateful if you could provide me a justification for why the regex is not a good fit (don't cite this as an example) and what exactly do the other algorithms have that make them superior. I am insisting because my calculator works just fine either you believe or not. It can handle complex nested calculations (not everything is based on regex) and also supports maths like sin, cos, etc. — Sarthak123, Jun 15 '18 at 07:10
@EJP mind you it well handles operator precedence (including brackets) example - 2+3/1.5*4^2 nicely returns 34.0 — Sarthak123, Jun 15 '18 at 07:16
"For those who downvote, please always tell why you felt this question as unfit for this site. It helps me to learn how to ask right question" - I did not donwvote but I think a post with [mcve] and concise question / problem statement , avoiding long explanations, would have been much clearer. — c0der, Jun 15 '18 at 07:21
@Sarthak123 you just have to change, which capture group you're converting to a double, as I made multiple — Lino, Jun 15 '18 at 07:21
@Sarthak123 you might also want to include negative signs for the base and the e-exponent — Lino, Jun 15 '18 at 07:25
@Sarthank So you equipped it with enough crutches for that. All you need now is a formal proof of the technique. Recursive descent has had one since 1959, and the shunting-yard since 1960-61. The inadequacy of regular expressions for context-free grammars was established in 1956. — user207421, Jun 15 '18 at 07:51
@Lino it works and that too without much hassles(only a single line edit)! thankyou. You might as well as post this as an answer. — Sarthak123, Jun 15 '18 at 07:56

Stephen C · Answer 1 · 2018-06-15T07:51:53.913

if you could provide me a justification for why the regex is not a good fit

A true regular expression cannot properly parse nested / balanced brackets. (OK, it is possible to use advanced regex features to do it, but the result is hellishly difficult to understand¹.)
A true regular expression will have difficulty analyzing an expression with operators that have different precedence. Especially with brackets. (I'm not sure if it is impossible, but it is certainly difficult.)
Once you have used your regex(es) to match the expression, you then have the problem of sorting out the "groups" that you have matched into something that allows you to (correctly) evaluate the expression.
A regex cannot produce any explanation if the input is syntactically invalid.
Complicated regexes are often pathologically expensive ... especially for large input strings that are incorrect.

what exactly do the other algorithms have that make them superior.

A properly written or generated lexer + parse will have none of the above problems. You can either evaluate the expression on the fly, or you can turn it into a parse tree that can be evaluated repeatedly; e.g. with different values for variables.

The shunting-yard algorithm (while of more limited application) also has none of the above problems.

This is about picking the right tool for the job. And also about recognizing that regexes are NOT the right tool for every job.

^{1 - If you want explore the rabbit warren of using regexes to parse nested structures, here is an entrance.}

1. Right. I don't even use regex for that. I solve nested brackets recursively by manually detecting bracket start and end positions (very simple to detect). 2. Maybe? I don't use that too. Solving operator precedence is very easy. Provided that all parantheses have already been simplified, first parse all exponents then parse all division then parse all multiplication and so on. Perfectly working if you ask me. 3. Don't know what you are talking about, that is actually a advantage of regex (splitting groups) ... — Sarthak123, Jun 15 '18 at 07:52
4. Right. In my application if an error occurs I do agree that my calculator can not exactly specify out exactly what caused the error (It is possible but too verbose to write). .... 5. That I am not sure. Never profiled my application. What I know is that it is fast and accurate to an extent that is 'excellent'. — Sarthak123, Jun 15 '18 at 07:54
I did not come here for a debate. Sorry. I just answered the questions that you asked. — Stephen C, Jun 15 '18 at 07:55
@Sarthak123 You will find that the alternative techniques that have been recommended here are several orders of magnitude faster. as well as being more reliable, and backed by sixty-plus years of computational science theory. Don't take a step backwards into the dark ages. — user207421, Jun 15 '18 at 11:11

Lino · Accepted Answer · 2018-06-15T08:15:33.190

1

According to your comment, by changing the regex from this:

([\\d\\.]+)\\*([\\d\\.]+)

to this works:

(\\d+(\\.\\d+)?(e\\d+)?)\\^(\\d+(\\.\\d+)?(e\\d+)?)

To explain what I've changed: Before, you were allowed to enter numbers in the format:

1
.5
.......
.3.76
and so on

To overcome this: I added an optional decimal place ((\\.\\d+)?), which allows integers, but also decimals.

Also by adding an optional scientific notation ( (e\\d+)?) on both sides, allows the numbers to be written:

As integers (2 ^ 5)
As decimals (2.3 ^ 5.7)
And as scientific (2.345e2 ^ 5e10)

You can of course mix all variants up.

But keep in mind the comments below your question. Regex is for small bits maybe useful, but it can get pretty clumpy, slow and messed up, the bigger the equations get.

Also if you want to support negative numbers, you can add optional hyphens in front of the bases and the exponents:

(-?\\d+(\\.\\d+)?(e-?\\d+)?)\\^(-?\\d+(\\.\\d+)?(e-?\\d+)?)

edited Jun 15 '18 at 08:15

answered Jun 15 '18 at 08:01

Lino

19,604
6
47
65

Allowing leading minus signs will interfere with recognising subtraction. – rici Jun 15 '18 at 14:19
@rici no infact it's plus feature. Expression like 3--2 get evaluated correctly. – Sarthak123 Jun 16 '18 at 03:27
@sarthak that one's easy, you can include the `-` in the folliwing number. But in `3-2` the regex will still match `-2`, which is wrong. Of course, it's all easy to parse; efficient and simple algorithms are well-known and could be copied from Wikipedia. – rici Jun 16 '18 at 04:37
why would regex match -2 in 3-2? regex is of form , it prioritizes matching operator first and then the negative in number (if present) so 3-2 is read correctly. – Sarthak123 Jun 16 '18 at 05:14
1

@Sarthak: Ah, I see what is being done there. You're right, it will get the right parse for negative numbers. Where it will (eventually) fail is `3^-(2-3)`. – rici Jun 16 '18 at 05:27
@rici oh right! 3^-(2-3) simplifies to 3^--1 now my calculator gets all confused what to do. Nice catch. – Sarthak123 Jun 16 '18 at 05:33
@Sarthak: This is the sort of thing which is just completely not a problem if you use a standard algorithm. You don't even notice that it happened. – rici Jun 16 '18 at 05:41
@rici I agree and +1 for showing me the real difference between mine and superior algorithms. Btw, are there some sort of examples to test a application if it can handle all tricky calculations? Like I feel after this example that there may be more cases where it could fail. How do you all test this out? – Sarthak123 Jun 16 '18 at 07:50
1

@sarhak: That's an excellent question, but unfortunately I don't know of any. There are thousands of calculators out there, many of them buggy, but each one has its own idiosyncratic syntax, operators and functions so it is hard to see how to create a universal repository of test cases. One of the reasons I favour use of parser generators based on context-free grammars is that it makes it possible to mechanically generate tests and to perform static analyses on the grammar itself. (A well-written grammar is also *documentation*; a soup of regular expressions is not.) – rici Jun 16 '18 at 14:06

Workaround for this calculator parsing error

2 Answers2

Linked