2

I have this code (was already there, isn't mine):

SELECT
    a.id_original_contrato AS contrato,
    ( CASE WHEN d.value~'^\\d+$' THEN d.value::integer ELSE 0 END ) AS monto,
    EXTRACT(YEAR FROM b.value)::integer AS anoinicio,
    EXTRACT(YEAR FROM c.value)::integer AS anofin

... etc (some JOIN's and WHERE's)

Let me explain: d.value comes from a table where value is character varying (200). The code will insert later the d.value (now called 'monto') in another table as a integer. Someone coded that regex in order to extract some chars or in other case (ELSE), define it as 0. Those values works when they are integer only. If I use a d.value like 76.44 it doesn't work due that regex, it always define it as 0.

Well, I have to change that code, because:

  • I need to store the d.value in the new table as numeric, not as integer anymore (In my new table the data type is numeric now)
  • But first, I need to correct that regex, because is messing my numeric numbers like 76.44, or 66,56 (dot or coma).

I'm not sure what that regex is doing. And how could I accomplish what need with a better or new regex?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
pmiranda
  • 7,602
  • 14
  • 72
  • 155

2 Answers2

2

Choose a variant:

with v(value) as (
    values
    ('12,3'),
    ('12.3'),
    ('123'),
    ('123.'),
    ('.123'),
    ('1.2.3')
    )

select 
    value, 
    value ~ '^(\d+[,\.]\d+|\d+)$' as variant_a,
    value ~ '^(\d*[,\.]\d*|\d+)$' as variant_b,
    value ~ '^\d+[,\.]\d+$' as variant_c
from v;

 value | variant_a | variant_b | variant_c 
-------+-----------+-----------+-----------
 12,3  | t         | t         | t
 12.3  | t         | t         | t
 123   | t         | t         | f
 123.  | f         | t         | f
 .123  | f         | t         | f
 1.2.3 | f         | f         | f
(6 rows)

To convert a string with a dot or a comma to numeric use replace():

select replace(value, ',', '.')::numeric;   
klin
  • 112,967
  • 15
  • 204
  • 232
2

The double backslash in \\d suggests an old version with standard_conforming_strings = off. The manual:

Beginning in PostgreSQL 9.1, the default is on (prior releases defaulted to off).

In modern versions with standard_conforming_strings = on, this string makes little sense as regular expression: '^\\d+$'. To detect strings consisting of one or more digits use either E'^\\d+$' (prefixed with E) or '^\d+$'. Details:

Integer literals also allow an optional leading sign for negative / positive numbers, and leading / dangling white space in Postgres.
So, this is the complete regular expression for valid integer literals:

CASE WHEN d.value ~ '^\s*[-+]?\d+\s*$' THEN d.value::int ELSE 0 END

The regular expression explained:

^ .. start of string
\s .. class shorthand for [[:space:]] (white space)
* .. quantifier for 0 or more times
[+-] .. character class consisting of + and -
? .. quantifier for 0 or 1 times
\d .. class shorthand for [[:digit:]] (digits)
+ .. quantifier for 1 or more times
\s* .. same as above
$ .. end of string

Consider the syntax rules for numeric string literals. One essential quote:

There cannot be any spaces or other characters embedded in the constant

That's because a numeric constant is not quoted, hence white space is not possible. Not applicable for casting strings. White space is tolerated:

Leading, trailing and right after the exponent char.

So these are all legal strings for the cast to numeric:

'^\s*[-+]?\d*\.?\d+(?:[eE]\s*[-+]?\d+)?\s*$'

The only new element are parentheses (()) to denote the contained regular expression as atom. Since we are not interested in back references, use "non-capturing": (?:...) and append a question mark (?:[eE]\s*[-+]?\d+)? to mean: the "exponential" part can be added or not, as a whole.

Assuming dot (.) as decimal separator. You might use comma instead (,) or [,\.] to allow either. But only dot is legal for the cast.

Test:

SELECT '|' || lit || '|' AS text_with_delim
     , lit ~ '^\s*[-+]?\d*\.?\d+([eE]\s*[-+]?\d+)?\s*$' AS valid
     , lit::numeric AS number
FROM   unnest ('{1
               , 123
               , 000
               , "  -1     "
               , +2
               , 1.2
               , .34
               , 5e6
               , " .5e   -6  "
                }'::text[]) lit;

Result:

text_with_delim valid number
|1| t 1
|123| t 123
|000| t 0
| -1 | t -1
|+2| t 2
|1.2| t 1.2
|.34| t 0.34
|5e6| t 5000000
| .5e -6 | t 0.0000005

Or you might have use for to_number() to convert strings of arbitrary given format.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I see, hmm. The coded is made to work in 9.2, and now it will run in 9.4. I'm still reading and trying to get your answer. Thanks (+1) – pmiranda Feb 14 '17 at 12:13
  • Great, only by using the '^\s*[-+]?\d*\.?\d+([eE]\s*[-+]?\d+)?\s*$' regex I can make it works! My Postgresql is set to US in postgresql.conf like this: `lc_numeric = 'en_US.UTF-8'` so by now I can only use `.` as decimal separator, but in production enviroment is set as `lc_numeric = 'es_ES.UTF-8' ` so i think I won't have problems there. I'll keep reading and doing all what you have written here Erwin, thanks you. – pmiranda Feb 14 '17 at 15:54
  • 1
    @pmirnd: `lc_numeric` influences the behavior of functions like `to_char()`, but the the *type cast* from `text` to `numeric` does *not* depend on locale settings. That would be madness. So, a comma is never legal in a *cast* to numeric. You'll have to replace it with `replace()` or `translate()`. Or use `to_number()`. – Erwin Brandstetter Feb 14 '17 at 16:15
  • Finally, what I did and now works ok is: `select cast(replace(d.value, ',', '.') as numeric) AS monto` . ta-da! – pmiranda Feb 14 '17 at 16:38