Convert hex to utf8 in greenplum in regexp_replace

Question

I have strings in a table that contain hex values such as \ffffffc4. An example is the following:

Urz\ffffffc4\ffffff85dzenie zgodne ze standardem High Definition Audio

The following code can convert the hex into UTF8:

select chr(x'c4'::int)

which returns Ä but when I try to use a regexp_replace I get into problems. I have tried the following:

select regexp_replace(sal_input, E'\\f{6}(..)',convert(E'\\1','xyz','UTF8'),'g')

where XYZ are the various source encodings offered in 8.2 but all I get back is the hex value.

Any idea on how I could use the chr function inside regexp_replace?

Version used: PostgreSQL 8.2.15 (Greenplum Database 4.1.1.1 build 1) on x86_64-unknown-linux-gnu

Thanks in advance for the help

I just realized that the double backslashes that I had in the SQL code were converted into single backslashes. So, the above code should have \\ instead of \ — apostolos1975, Sep 19 '12 at 10:08
BTW, why such an ancient PostgreSQL? Does Greenplum not keep up with mainline releases? — Craig Ringer, Sep 19 '12 at 12:25
not sure why they are not upgrading but I am guessing that this due to the fact that this is a production system and downtime is NO-NO — apostolos1975, Sep 19 '12 at 17:05
@CraigRinger GP is based on 8.2.15 but some Postgres features are backported(?) / used and cool enhancements are never shared with Postgres community. — mys, Nov 16 '12 at 14:19

Craig Ringer · Answer 1 · 2012-09-20T00:21:09.407

1

You are misunderstanding the order of evaluation. The 2nd argument to regexp_replace isn't a callback invoked for every substitution of '\1'.

What happens is that your convert call is evaluated first, on the literal value \1, and that result is passed to regexp_replace.

In any case, the SQL doesn't even evaluate on a modern PostgreSQL because of stricter casting rules, as '\1' isn't a valid bytea literal.

In a less ancient Pg version it might be possible to do something with regexp_split_to_table, chr and string_agg. In 8.2, I think you're going to be using a PL. I'd load PL/Perl and write a simple Perl function to do it. It's likely possible to implement in PL/PgSQL, but I suspect any implementation with the functionality available in 8.2 will be verbose and slow. I'd love to be proved wrong.

edited Sep 20 '12 at 00:21

answered Sep 20 '12 at 00:15

Craig Ringer

307,061
76
688
778

But the documentations for regexp_replace says the following regexp_replace(source, pattern, replacement [, flags ]) I am assuming that the second argument (in this case E'\\f{6}(..)') needs to be evaluated first in order to find the substrings (if any) that will be replaced and the outcome (\1) is replaced according to the third parameter which is the convert(E'\\1','xyz','UTF8'). – apostolos1975 Sep 20 '12 at 09:05
@apostolos1975 If you call `f(g('x'))`, the order of evaluation is *always*, irrespective of the function, to evaluate `g('x')` to produce an intermediate result we'll call `gx`, then evaluate `f(gx)` to produce the final result. If `g(col)` is calling `g` on column-reference `col` then for each value of `col` `g(col)` is evaluated and passed to `f`. **`f` is not and can not ever be evaluated before `g`**. (There are some lazy-evaluated functional languages where that may not be true, but PostgreSQL and SQL in general are not one of them). – Craig Ringer Sep 20 '12 at 11:01
... so *first*, `convert(E'\\1','xyz','UTF8')` is evaluated on the *literal argument* `E'\\'`, then the *result* is passed as the 3rd parameter to `regexp_replace`. You appear to want `convert` to be executed for each substitution found by `regexp_replace`, but that just isn't how it works. – Craig Ringer Sep 20 '12 at 11:08

Convert hex to utf8 in greenplum in regexp_replace

1 Answers1