1

My table contains a field lastName that is UNICODE

I need to implement the SOUNDEX function, but it won't work because it requires only latin. So I tried converting it to latin but still get the same error only latin letters allowed

Here is what i tried

SEL * 
FROM tab_test
WHERE SOUNDEX(REGEXP_REPLACE(lastName, '[^A-Z]', '')) = 'smith' 



SEL * 
FROM tab_test
WHERE SOUNDEX(TRANSLATE(lastNameUSING unicode_to_latin)) = 'smith' 

How can I correct the issue


EDIT

Here is what i tried as dnoeth suggests. But it still does not work

     SEL * 
     FROM tab_test
     WHERE SOUNDEX(TRANSLATE(REGEXP_REPLACE(
     lastName,'[^a-zA-Z]','') USING   
     UNICODE_TO_LATIN))  = 's530' 
     

EDIT 2

Here is the query that does not work

 SEL  lastName, REGEXP_REPLACE(lastName, '[^a-zA-Z]', '') lastName_regex 
 FROM (SEL * 
 FROM tab_test
 WHERE personId < 10 
 ) der 
 WHERE SOUNDEX(REGEXP_REPLACE(lastName, '[^a-zA-Z]', '')) = 's530'  

The result of the REGEXP sub query is the following

SEL  lastName, REGEXP_REPLACE(lastName, '[^a-zA-Z]', '') lastName_regex 
 FROM (SEL * 
 FROM tab_test
 WHERE personId < 10 
 ) der  

Result is copied directly from teradata sql assistant

    LASTNAME    lastName_regex
1   Smith                   Smith              
2   Smith                   Smith              
3   Smith                   Smith              
4   Smith                   Smith              
5   Smith                   Smith              
6   Smith                   Smith              
7   Smith                   Smith              
8   Smith                   Smith              
Community
  • 1
  • 1
Pasha
  • 181
  • 1
  • 1
  • 13

3 Answers3

3

Your regex removes lowercase a to z, too.

So try either

REGEXP_REPLACE(lastName, '[^a-zA-Z]', '') -- explicitly add lowercase

or

REGEXP_REPLACE(lastName, '[^A-Z]', '',1,0,'i')) -- do a case insensitive comparison

Btw, the result of a SOUNDEX('smith') is not 'smith', but 's530'.

dnoeth
  • 59,503
  • 4
  • 39
  • 56
  • I still get the same error. THe regex does not help – Pasha Aug 05 '15 at 15:29
  • This should work, strange. What's your exact Teradata release (`select * from dbc.dbcinfoV where InfoKey = 'Release'`)? Can you show some of the bad data, just the result of REGEXP_REPLACE? – dnoeth Aug 05 '15 at 16:45
  • The release is `14.00.07.14` The bad data looks normal. I copied it directly from the SQL assistant. This small sample above does not work – Pasha Aug 05 '15 at 19:05
  • There must be something else, the data you showed can't result in an error when using SOUNDEX. – dnoeth Aug 05 '15 at 21:02
0

You might need

[^a-zA-Z]+

i.e, the "+"

Al Foиce ѫ
  • 4,195
  • 12
  • 39
  • 49
Joe
  • 1
  • 1
0

Try USING UNICODE_TO_LATIN WITH ERROR

Diego
  • 812
  • 7
  • 25