4

For an online calculator where users may enter an energy amount to calculate corresponding fees, I need the PHP script to accept various user inputs. The value of "2 million and one fourth joule" may be entered as:

2000000.25 (default notation)

2,000,000.25 (with thousands separator)

2000000,25 (comma as decimal point)

2.000.000,25 (comma as decimal point, with thousands separator)

2'000'000.25 (alternative format)

2 000 000,25 (French notation)

How could I make the script aware of such differences?

My first try was to just str_replace alternative characters with the default ones, but the period (.) may be either a decimal or a thousands separator. I tried using sscanf but how can I make sure that it reads the number correctly?

Most users will only provide two digits after the decimal point, but is there any way I can distinguish 1.234 (1 point 234, period as decimal separator) and 1.234 (one thousand two hundred thirty-four, period as thousands separator)?

Paul
  • 8,974
  • 3
  • 28
  • 48
  • I guess you could make an educated guess by looking at the `Accept:` header, but you should really make your users standardize on something ... split the controls into two if you have to. – Ja͢ck Nov 22 '12 at 12:51

2 Answers2

3

There's no way to know what 1.234 means.

  • decide on a decimal separator
  • replace the last , or . with that separator
  • only allow one separator
  • limit the input to numeric values, , and .
user247702
  • 23,641
  • 15
  • 110
  • 157
  • I thought so already. Any solution for the rest (leaving out that ambigious part)? Since it's a tool that my be used by users from different countries, I don't really want to restrict the input. – Paul Nov 22 '12 at 12:59
  • Is it a real issue? I personally don't use separators in an input field. Regardless, you don't have to block the input of invalid characters, you can silently remove them. – user247702 Nov 22 '12 at 13:24
  • It is indeed a real issue. The calculator is for both private persons and small companies to calculate fees which depend on the amount of energy they are generating. While private persons may only enter values below 1000 kW, companies can easily reach several thousand or even some millions of kW. The most concerning issue is to separate British private persons (e.g. 15.75 kW) from German companies (e.g. 15.750 kW which is German notation for 15750 kW with thousands separator). I am now working on a solution using two preg_match_all expressions. – Paul Nov 22 '12 at 14:51
  • Good luck and be sure to do extensive testing :) – user247702 Nov 22 '12 at 16:00
3

Since I wasn't able to find a simple solution via some built-in PHP functions, I wrote two functions to (1) check if the entered string may be a number at all and (2) if it is well-formed depending on the separators used.

I restricted the possible separators to period (.), comma (,), space () and apostrophe (') as thousands separators. The decimal point may only be one of the first two options. Both sets of separators can be edited to allow even more or restrict the ones in place.

What I am actually doing is to look for all number columns and all separators by using a couple of simple preg_match_all calls.

The complete code reads as follows and should be self-explaining as I added some comments when throwing a false. I'm sure, this can be simplified somehow, but it works right now and filters many errors while allowing even some strange combinations such as 2 000 000.25 or 2'000'000,25.

    function check_number($number) {
        if ((int) substr($number,0,1) == 0) {
            return false; // not starting with a digit greater than 0
        }
        if ((string) substr($number,-1) != "0" && (int) substr($number,-1) == 0) {
            return false; // not ending with a digit
        }
        preg_match_all('/([^0-9]{2,})/', $number, $sep, PREG_PATTERN_ORDER);
        if (isset($sep[0][0])) {
            return false; // more than one consecutive non-digit character
        }
        preg_match_all('/([^0-9]{1})/', $number, $sep, PREG_PATTERN_ORDER);
        if (count($sep[0]) > 2 && count(array_unique($sep[0])) > 2) {
            return false; // more than 2 different separators
        }
        elseif (count($sep[0]) > 2) {
            $last_sep = array_pop($sep[0]);
            if (!in_array($last_sep,array(".",","))) {
                return false; // separator not allowed as last one
            }
            $sep_unique = array_unique($sep[0]);
            if (count($sep_unique) > 1) {
                return false; // not all separators (except last one) are identical 
            }
            elseif (!in_array($sep_unique[0],array("'",".",","," "))) {
                return false; // separator not allowed
            }
        }
        return true;
    }

    function convert_number($number) {
        preg_match_all('/([0-9]+)/', $number, $num, PREG_PATTERN_ORDER);
        preg_match_all('/([^0-9]{1})/', $number, $sep, PREG_PATTERN_ORDER);
        if (count($sep[0]) == 0) {
            // no separator, integer
            return (int) $num[0][0];
        }
        elseif (count($sep[0]) == 1) {
            // one separator, look for last number column
            if (strlen($num[0][1]) == 3) {
                if (strlen($num[0][0]) <= 3) {
                    // treat as thousands seperator
                    return (int) ($num[0][0] * 1000 + $num[0][1]);
                }
                elseif (strlen($num[0][0]) > 3) {
                    // must be decimal point
                    return (float) ($num[0][0] + $num[0][1] / 1000);
                }
            }
            else {
                // must be decimal point
                return (float) ($num[0][0] + $num[0][1] / pow(10,strlen($num[0][1])));
            }
        }
        else {
            // multiple separators, check first an last
            if ($sep[0][0] == end($sep[0])) {
                // same character, only thousands separators, check well-formed nums
                $value = 0;
                foreach($num[0] AS $p => $n) {
                    if ($p == 0 && strlen($n) > 3) {
                        return -1; // malformed number, incorrect thousands grouping
                    }
                    elseif ($p > 0 && strlen($n) != 3) {
                        return -1; // malformed number, incorrect thousands grouping
                    }
                    $value += $n * pow(10, 3 * (count($num[0]) - 1 - $p));
                }
                return (int) $value;
            }
            else {
                // mixed characters, thousands separators and decimal point
                $decimal_part = array_pop($num[0]);
                $value = 0;
                foreach($num[0] AS $p => $n) {
                    if ($p == 0 && strlen($n) > 3) {
                        return -1; // malformed number, incorrect thousands grouping
                    }
                    elseif ($p > 0 && strlen($n) != 3) {
                        return -1; // malformed number, incorrect thousands grouping
                    }
                    $value += $n * pow(10, 3 * (count($num[0]) - 1 - $p));
                }
                return (float) ($value + $decimal_part / pow(10,strlen($decimal_part)));
            }
        }
    }

I am aware of one flaw this set of function has: 1.234 or 1,234 will always be treated as the whole number 1234, as the function assumes the separator must be a thousands separator if there are less than 4 digits in front of the single separator.

Paul
  • 8,974
  • 3
  • 28
  • 48