4

Is there any function which can replace all characters except ascii 32 to 127 and ascii 0, 13, 27 in postgres sql. I do not want to replace spaces, line feeds etc. I want to replace weird characters like club signs, square or a weird asterisk.

I tried modifying regexp_replace like below but it is not working.

select regexp_replace('abc$wanto&tore9046move#special~04 chars', '[^\x00-\x7f]', '', 'g') 
--This is giving error ERROR: 22021: invalid byte sequence for encoding "UTF8": 0x00

select *, regexp_replace('abc$wanto&tore9046move#special~04 chars', '[^[:ascii:]]', '', 'g')
--This one is taking everything beyond 255 also in the set. 

Thanks much for your time and help

Nik
  • 371
  • 4
  • 15

2 Answers2

3

Try unicode range:

select regexp_replace('abc$wanto&tore9046move#special~04 chars', '[\u0080-\u00ff]', '', 'g')

Reference

This will remove any character in 128-255 ascii range.

anubhava
  • 761,203
  • 64
  • 569
  • 643
2

You were almost right with:

select regexp_replace('abc$wanto&tore9046move#special~04 chars', '[^\x00-\x7f]', '', 'g') 

but the null byte \x00 is not valid in PostgreSQL string literals, so you'd have to start at \x01. Your desired range starts at 32 (0x20), so use that plus a few specific inclusions for 13 (0x0d) and 27 (0x1b):

select regexp_replace('abc$wanto&tore9046move#special~04 chars', '[^\x20-\x7f\x0d\x1b]', '', 'g')

or, with a more useful input:

regress=> select regexp_replace('aáiï*∞ıb ', '[^\x20-\x7f\x0d\x1b]', '', 'g');
 regexp_replace 
----------------
 ai*b 
(1 row)
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • That last regex is the only one that really does the job that is likely needed since the others assume there are no non-printable characters in the lower range, which isn't accurate. – hemp Dec 26 '19 at 07:40