1

I am using talend to transfer data from old database to new database. Old database is PostgreSQL and new one is MySQL.

But I have problem, because in old database there are these rows:

id      lft     rgt     rank_id    name_sci
------------------------------------------------------
6530    6137    6138    88         Ocythoë tuberculata
6340    6127    6128    88         Ocythoe tuberculata

Mind the two dots above e

So I have

tPsqlInput --> tFilterRow --> tMap --> tMysqlOutput

However, I am getting error: Duplicate entry 'Ocythoe tuberculata' for key 'bio_taxons_name_sci_unique' If I put tLogRow component just before inserting into MySQL, i get correct output: Ocythoë tuberculata and Ocythoe tuberculata;

I have tried to directly insert into database:

into bio_taxons (name_sci, created_by, taxonomic_rank_id ) values ('Ocythoë', 1, 10);

And it works, and insertes that e with dots correctly, and when I do SELECT it shows that it has successfully inserted above row.

But from talend, for some reason it replaces ë (e with two dots) with ordinary e (without dots).

And i do not know how to solve this. I have tried editing MySQL connection but nothing seems to work:

Here's the current mysql connection string:

jdbc:mysql://:3306/fis?noDatetimeStringSync=true&characterEncoding=Cp1252

I have tried with values "utf8", "utf-8", "latin1" "utf8mb4" but none of it worked...

Also in mysql:

mysql> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| latin1                   | latin1_swedish_ci    |
+--------------------------+----------------------+
clzola
  • 1,925
  • 3
  • 30
  • 49

1 Answers1

1

I recommend using utf8 character set and corresponding collation across the board, so that you accommodate for every character (If you also need emoji go for utf8mb4).

In order to convert your database to UTF8:

ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci

You may also need to change server encoding if it's not already in utf8. Check it with:

SHOW VARIABLES WHERE variable_name LIKE 'char%'

Then instruct the JDBC driver to use utf8 enconding. Your connection string becomes:

jdbc:mysql://:3306/fis?noDatetimeStringSync=true&useUnicode=true&characterEncoding=utf-8
Ibrahim Mezouar
  • 3,981
  • 1
  • 18
  • 22