2

I am loading data dump from external source and some strings contain \uXXXX sequences for the UTF8 chars, like this one:

\u017D\u010F\u00E1r nad S\u00E1zavou

I can check the contents by using E'' constant in psql, but cannot find any function/operator to return me proper value.

I'd like to ask, if it's possible to convert this string with unicode escapes into normal UTF8 without using PL/pgSQL functions?

vyegorov
  • 21,787
  • 7
  • 59
  • 73
  • The \E'' literal format is not a psql client thing -- any SQL code running on the PostgreSQL server can use that syntax. You can put the string into a literal like `E'\u017D\u010F\u00E1r nad S\u00E1zavou'` to have it properly interpreted; what else would you like to do? – kgrittn Apr 11 '12 at 19:48
  • I want to be able to do something like: `UPDATE table SET proper = somefunc('\u017D\u010F\u00E1r nad S\u00E1zavou') WHERE id=1;`. And get the expected UTF8 string. – vyegorov Apr 11 '12 at 19:54
  • If the database is using UTF8 encoding, you should be able to do something like: `UPDATE table SET proper = E'\u017D\u010F\u00E1r nad S\u00E1zavou' WHERE id=1;` What do you get when you run `SHOW server_encoding;`? How about `SHOW client_encoding;`? – kgrittn Apr 11 '12 at 20:45
  • Encoding is fine, `UTF8`. Problem is that I cannot use literal constants. I need a table-wide query, like: `UPDATE table SET proper = somefunc(badtext);` for a 8M row table. Copy-paste is not an option. – vyegorov Apr 11 '12 at 21:04
  • Where is `badtext` coming from? What encoding is it in? Is this in a disk file where you could just run a encoding conversion utility on it? Depending on context, one of the convert_* functions in this table might possibly help: http://www.postgresql.org/docs/9.1/static/functions-string.html#FUNCTIONS-STRING-OTHER – kgrittn Apr 11 '12 at 21:09
  • The mentioned `\uXXXX` sequences are not the result of the bad encoding conversion. They're just there, literally. – vyegorov Apr 11 '12 at 22:01
  • I found and fixed a weakness in my answer. – Erwin Brandstetter Mar 02 '13 at 21:55
  • @CraigRinger, this question was asked 2 years ago, yet you've marked it as a duplicate for a quite recent one. Should be the other way around, don't you think so? – vyegorov May 15 '14 at 09:37
  • It's not just age, it's also how searchable it is, how clear the question is, and how good the answer(s) are. In this case I'd prefer to merge them entirely, but SO doesn't provide for that. Note that there's nothing bad about something being marked a duplicate, it's not a statement that there's something wrong with the question, just that there are similar ones out there. – Craig Ringer May 15 '14 at 11:29

1 Answers1

4

I don't think there is a built in method for that. Easiest way I can think of is the plpgsql function you wanted to avoid:

CREATE OR REPLACE FUNCTION str_eval(text, OUT t text)
  LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
BEGIN
   EXECUTE 'SELECT E''' || replace($1, '''', '''''') || ''''
   USING $1
   INTO t;
END
$func$;

The updated version safeguards against SQLi and is faster, too.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Erwin Brandsetter, Thanks ! – ceadreak Jun 02 '16 at 16:06
  • Hum... It is not working, no string change at `SELECT E'Ceci\u008Dlia'`. Expected Cecília. Same problem when using in `select str_eval(x) from t(x)`. – Peter Krauss Jun 16 '20 at 17:43
  • 1
    @PeterKrauss: `E'Ceci\u00EDia'` works for me. – Erwin Brandstetter Jun 19 '20 at 02:17
  • oops, sorry Erwin (and readers) my fast and non-checked comment. `\u008D` is not a diacrilic, it is a non-printable ASCII... Was a "clean bad encode" problem, [solved here](https://stackoverflow.com/a/62416723/287948). – Peter Krauss Jun 20 '20 at 08:52
  • Now a really important comment... About your 2012 parse-string function `str_eval()` and PostgreSQL project: why in nowadays (2020) there are [no internal function for string unescape](https://www.postgresql.org/docs/12/functions-string.html)? The *3k pageviews here + [5k pageviews here](https://stackoverflow.com/q/20124393/287948)* shows that many programmers was looking for it (!)... Small continuous quality improvement needs small democracy, but there is no "user community demand" consensus mechanism to **vote on small optimizations** for pg library. – Peter Krauss Jun 20 '20 at 08:56