4

I need to clean one character column and for that I am using REGEXP_REPLACE function in Teradata 14.

The same piece of code worked for some other data source (having the same LATIN encoding).

The data definition using show table has given me below format of the data:

CREATE SET TABLE pp_oap_cj_t.dc_loss_fdr_kn ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
( PARENT_ID DECIMAL(38,0),
 FS_MRCH_NM VARCHAR(25) CHARACTER SET LATIN NOT CASESPECIFIC
) PRIMARY INDEX ( PARENT_ID );

The query I am performing is as below:

CREATE TABLE pp_oap_pb_piyush_t.CHECKMERCHANT1 AS (
 SELECT 
 FS_MRCH_NM,
REGEXP_REPLACE(trim(Upper(trim(REGEXP_REPLACE( (FS_MRCH_NM ) , '[^a-z]',' ',1,0,'i'))) ), '[[:space:]]+',' ',1,0,'i')   as cleaned_merchant
 FROM pp_oap_pb_piyush_t.CHECKMERCHANT)
  WITH DATA PRIMARY INDEX (FS_MRCH_NM);

Error

CREATE TABLE Failed. 6706:  The string contains an untranslatable character.

I need a quick turnaround this bottleneck.

Help is really appreciated ! Thanks !!!!

Md. Rezwanul Haque
  • 2,882
  • 7
  • 28
  • 45
Piyush Upadhyay
  • 41
  • 1
  • 1
  • 3

4 Answers4

5

REGEXP_REPLACE under the hood converts character set Latin to Unicode. You have defined your variable as character set Latin. You see the error when data has something which cannot be converted from Latin to Unicode. Best thing is to fix your DDL to have character set as Unicode instead of Latin. something like TRANSLATE(FS_MRCH_NM USING LATIN_TO_UNICODE WITH ERROR) in your code instead of FS_MRCH_NM should work. Problem with this it result in null values when you have untranslatable characters.

Kiran
  • 3,255
  • 3
  • 14
  • 21
  • Hi Kiran, Really appreciate the help! I used the suggested solution, but still throw the same error. What could be the probable reason? Thanks! – Piyush Upadhyay Jul 20 '17 at 22:52
  • Hi Kiran, I should correct my self. It worked! Just one thing, it should be LATIN_TO_UNICODE. Again, really appreciate the help! Thanks a ton!!!!!! – Piyush Upadhyay Jul 20 '17 at 23:32
  • Hi Piyush, Changed LATIN_TO_UNICODE in the answer. If you feel the answer has helped you then please accept it as solution. – Kiran Jul 21 '17 at 13:21
1
UPDATE CFSYSUAT.Metadata
SET MetadataTxt = 
CASE WHEN TRANSLATE_CHK(MetadataTxt USING UNICODE_TO_LATIN) > 0
THEN SUBSTRING(MetadataTxt,1,TRANSLATE_CHK(MetadataTxt USING UNICODE_TO_LATIN) - 1) || 
     SUBSTRING(MetadataTxt,TRANSLATE_CHK(MetadataTxt USING UNICODE_TO_LATIN) + 1) 
ELSE MetadataTxt END;

Had some luck with TRANSLATE_CHK. It returns the position of the offending character. Used it with SUBSTRING to remove the offending character. If the text contains multiple bad characters you have to run the update multiple times, each pass will correct another bad character.

HTH, Nathan

NathanH
  • 81
  • 1
  • 3
0

A non-Unicode-compatible version of oreplace is installed in our syslib, and a Unicode-compatible version is in our td_sysfnlib. When the database is not specified, syslib is used before td_sysfnlib. So forcing TD to use the td_sysfnlib version of oreplace solved the problem.

Here's the code used:

SELECT td_sysfnlib.OREPLACE(item_name,'|','') FROM databaseB.sales;

I hope that helps anyone else who's running into the same issue!

exploitr
  • 843
  • 1
  • 14
  • 27
-1
TRANSLATE(OREPLACE(TRANSLATE(item_name USING LATIN_TO_UNICODE WITH ERROR),'|','') USING UNICODE_TO_LATIN WITH ERROR) AS LBL
double-beep
  • 5,031
  • 17
  • 33
  • 41
Younes
  • 1