I've been trying to convert a unicode regex to POSIX regex to remove \p{So}
, \p{Cs}
, \p{Cn}
and \x1A
type of characters from a column.
In Informatica I was using reg_replace (col_name,'[\p{So}\p{Cs}\p{Cn}\x1A]',' ')
function to filter out these kind of characters and replace them with a single space.
However, when migrating to RedShift the same regex is not working with REGEXP_REPLACE()
as it does not support the Unicode blocks.
For e.g. this is the string :-
'INT¡®L n°1 di KONGRESSE Ä Ê Í EUROREGIONE. V.Ž?#'
From above string, I need to keep the Latin / French letters, alphanumeric & punctuations and just remove the other symbols.