It's been a decade I have worked with Postgres and Perl.
One of my oldest still-operated applications, an dictionary of government addresses and departmental responsibilities, has issues handling query terms containing accented characters, for example köln
. In other words, whenever a query term contains a accented character (mainly umlauts) there are 0 results returned.
I have to mention that this behavior is only happening using this application with Postgres as the database. If I switch to MySQL5 (same data) same queries are working correctly.
Trying to track the cause of this problem I have checked the following:
- Postgres database is UTF-8 (using the command
show server_encoding;
) - Postgres client encoding is also UTF8 (using
show client_encoding;
) - If I use the Postgres monitor and execute the same SQL query as the application does, using accented characters in the query term, I get correct results
- The Perl application itself is handling UTF-8, the HTML-Header is set correctly, contents of the output display correct and not garbled
- All Perl code files, scripts, .pm package files and templates are UTF-8 encoded (I verified that with
file --mime perl_file_name
) - I fiddled with the database connection, setting
$self->{dbh}->{pg_enable_utf8} = 1;
or/and$self->{dbh}->do("SET CLIENT_ENCODING TO 'UTF8';");
or/and$self->{dbh}->do("SET NAMES 'UTF8';");
with no change - I've updated the
DBD::Pg module
to version 3.6.2, no change.
So I am pretty much out of ideas what else to check or try to get Postgres fully working. Like mentioned in my intro, same application just using MySQL as database works flawlessly.
2 years ago the application was changed to handle UTF-8 data, I did not do the changes myself, but as far as I can see in the code (compared to the code in my GIT repo) its just the HTML UTF8-Header print "Content-type: text/html; charset=utf-8\n\n";
and a few unrelated template parts. Perhaps that change somewhere is the origin for all the problems but I don't know what esp. to adjust for Postgres.
The current Perl version is 5.22.1, using Apache/2.2.22 (Ubuntu). The vhost configuration is simple:
AddHandler cgi-script .cgi .pl
ScriptAlias /...abs-path-to-app.../cgi-bin/
<Directory "/...abs-path-to-app.../cgi-bin/">
AllowOverride None
Options +Indexes +ExecCGI +MultiViews +SymLinksIfOwnerMatch
<IfVersion < 2.4>
Allow from all
</IfVersion>
<IfVersion >= 2.4>
Require all granted
</IfVersion>
Allow from all
</Directory>
Postgres is version 9.1.24.
Edit:
Collate
and Ctype
is set to en_US.UTF-8
, Encoding
is set to UTF-8
for the database in question.
Taking a look into the tables, all character varying
columns use pg_catalog."default"
collation. Executing show lc_collate;
show already mentioned en_US.UTF-8
.
Edit2:
Using the DBD::Pg
flag pg_enable_utf8
and setting it to 0
seems to work out and I get the expected results. Using a value other than 0
, for example '-1or
1` does not work. I tried out that flag (once again) right after the database connect. Actually I have to verify this as I still do not really understand what's going on.