0

I need to do an update with french characters in Teradata, the problem is that I don't know where I can find a conversion list. For example:

É --> É

î --> Î

Ö --> Ö

Where can I find the all-corresponding list of symbols for each French character? or a way to do the update using some functions?

Thank you.

10k
  • 17
  • 3
  • It appears you have UTF8 data (inherently variable length) but are viewing it as single-byte extended ASCII. Load the data with UTF8 session character set, or export it as ASCII and reload as UTF8. It's also possible to write a Java or C UDF to convert UTF8 variable-length byte sequences to their fixed length character equivalents, e.g. see [Wikipedia UTF8](https://en.wikipedia.org/wiki/UTF-8). Accented Roman letters (including those used in French) should be included in Teradata LATIN, so either LATIN or UNICODE for the target column should be fine. – Fred Aug 01 '22 at 15:30
  • I don't think this is gonna be possible, since our DBA team have to do what you suggested and they won't (I guess they have to do some adjustments). Is there something else I can do at my level? I can't do the export-import method since we receive data at daily basis. – 10k Aug 01 '22 at 19:25
  • The accented LATIN characters are all in the 0xC0 - 0xFF range so the corresponding UTF8 encoding will be 2-byte sequences from 0xC3 0x80 through 0xC3 0xBF. It's going to be difficult to do the conversion without reloading the data or using a UDF. Expressed using CHR function (so decimal instead of hex) you would be converting `CHR(195)||CHR(128)` to `CHR(192)`, `CHR(195)||CHR(129)` to `CHR(193)`, ..., `CHR(195)||CHR(191)` to `CHR(255)`. – Fred Aug 02 '22 at 16:42
  • Actually, that's not quite true. The OE ligature and "Y with diaeresis" don't map in that same sequence. And for those it matters whether the Teradata column is defined as LATIN or UNICODE. (The eth, thorn, and Euro sign also don't follow the mapping above, but those are not French letters.) I suppose you could use `OREPLACE()` to map each multiple character sequence to the desired single character but you'd potentially have to apply the function many (34?) times to cover all the possible French letters. – Fred Aug 02 '22 at 19:16
  • And that's exactly what I did in the first place (using the OREPLACE function). Example: `RegExp_Replace(OReplace(RegExp_Replace(RegExp_Replace(RegExp_Replace(RegExp_Replace(OReplace(RegExp_Replace(RegExp_Replace(field_name, 'É|é|è|É|È|Ê','E') ,'Å"|¼|Å''|Å''''','OE'), '?', ' '), 'ô|Ö','O'), '’',''''), 'ç|Ç','C'), 'Â|"|€¦|€"|µ|ù',''), '§','&'), 'î|Ã','I')`. Nevertheless, I was looking for an easier way. Anyway, thanks for your time. – 10k Aug 03 '22 at 17:35

0 Answers0