3

I am trying to insert UTF8 strings into PostgreSQL, using DBD::Pg and always get double-encoded UTF8 in the database. I only get the correct characters in the DB if the strings I insert are Latin1.

How can I tell DBD::Pg to not re-encode my strings which are already UTF8?

Test script:

!/usr/bin/perl

use strict;

my $PGDB      = 'dbi:Pg:dbname=test;host=localhost';
my $PGDB_USER = 'username';    
my $SQL_INSERT = 'INSERT INTO tmp1 (t1, t2) VALUES (?, ?)';

use DBI;

my $dbh = DBI->connect($PGDB, $PGDB_USER)
    || die "Couldn't connect to $PGDB as user $PGDB_USER: $DBI::errstr\n";

#$dbh->do("SET client_encoding TO UTF8");

my $sth = $dbh->prepare( $SQL_INSERT )
    || die "Can't prepare insert statement $SQL_INSERT: $DBI::errstr";

my $cp1252 = "\xe9 voil\xe0";
my $utf8   = "é voilà";

utf8::upgrade($utf8);

use utf8;
#use bytes;

my $text = 'sent utf8 w. utf8::upgrade';

$sth->execute($utf8, $text) or die $sth->errstr, "\n";

The resulting table after a few tests:

é voilà     sent cp1252 as_is w. use bytes
é voilà   sent utf8 as_is w. use bytes
é voilà   sent utf8 as_is w. use utf8
é voilà     sent cp1252 as_is w. use utf8
é voilà     sent cp1252 as_is w. do(SET client_encoding TO UTF8)
é voilà   sent utf8 as_is w. do(SET client_encoding TO UTF8)
é voilà   sent utf8 as_is w. use utf8 + do(SET client_encoding TO UTF8)
é voilà   sent utf8 w. utf8::upgrade + do(SET client_encoding TO UTF8)
é voilà   sent utf8 w. utf8::upgrade

(This is on Ubuntu 16.04 with DBD::Pg version 3.5.3. I did not have this problem with the DBD::Pg version that installed on Ubuntu 12.04)

mivk
  • 13,452
  • 5
  • 76
  • 69
  • 4
    Your `use utf8` is too late. Move it to the top of the file and drop the `utf8::upgrade`. (This is assuming that your source is encoded as UTF-8?) – Borodin Mar 12 '17 at 21:56
  • @Borodin : I thought all `use` statements were always executed before anything else anyway. But it looks like moving it to the top actually makes it work! – mivk Mar 12 '17 at 22:41
  • 2
    @mivk, They are, but pragmas usually restrict their effect to the lexical scope in which they are declared, so that you can term them on and off at will. – ikegami Mar 13 '17 at 06:51
  • 2
    Please provide the output of [this program](http://pastebin.com/JimvZgUy). (I don't have a ProgreSQL db) – ikegami Mar 13 '17 at 07:14
  • @ikegami: not sure how to make it output something useful. As it is, it just complains `DBD::Pg::db selectrow_array failed: ERROR: could not determine data type of parameter $1 at ... line 38`. – mivk Mar 13 '17 at 11:31
  • What's the "client encoding" (SQL command `SHOW client_encoding;`)? If it's not set to UTF-8, this might be caused by [this bug](https://rt.cpan.org/Public/Bug/Display.html?id=103137) fixed with [this pull request](https://github.com/bucardo/dbdpg/pull/17) (not released yet). – nwellnhof Mar 13 '17 at 12:12
  • @nwellnhof: it's UTF8 – mivk Mar 13 '17 at 14:09
  • Note that if you change `client_encoding` from a different value to UTF-8 with `do`, you must also set [`pg_enable_utf8`](https://metacpan.org/pod/DBD::Pg#pg_enable_utf8-(integer)) to 1 (or set it to -1 after changing the encoding). – nwellnhof Mar 13 '17 at 16:20
  • 1
    `pg_enable_utf8` only affects values returned from the DB – ikegami Mar 13 '17 at 21:02
  • @ikegami The documentation makes it sound like that, but it's not true. – nwellnhof Mar 15 '17 at 13:37
  • Are you really sure that `client_encoding` defaults to UTF-8? What's the encoding of your database? – nwellnhof Mar 15 '17 at 13:57

0 Answers0