2

It's been a decade I have worked with Postgres and Perl.

One of my oldest still-operated applications, an dictionary of government addresses and departmental responsibilities, has issues handling query terms containing accented characters, for example köln. In other words, whenever a query term contains a accented character (mainly umlauts) there are 0 results returned.

I have to mention that this behavior is only happening using this application with Postgres as the database. If I switch to MySQL5 (same data) same queries are working correctly.

Trying to track the cause of this problem I have checked the following:

  • Postgres database is UTF-8 (using the command show server_encoding;)
  • Postgres client encoding is also UTF8 (using show client_encoding;)
  • If I use the Postgres monitor and execute the same SQL query as the application does, using accented characters in the query term, I get correct results
  • The Perl application itself is handling UTF-8, the HTML-Header is set correctly, contents of the output display correct and not garbled
  • All Perl code files, scripts, .pm package files and templates are UTF-8 encoded (I verified that with file --mime perl_file_name)
  • I fiddled with the database connection, setting $self->{dbh}->{pg_enable_utf8} = 1; or/and $self->{dbh}->do("SET CLIENT_ENCODING TO 'UTF8';"); or/and $self->{dbh}->do("SET NAMES 'UTF8';"); with no change
  • I've updated the DBD::Pg module to version 3.6.2, no change.

So I am pretty much out of ideas what else to check or try to get Postgres fully working. Like mentioned in my intro, same application just using MySQL as database works flawlessly.

2 years ago the application was changed to handle UTF-8 data, I did not do the changes myself, but as far as I can see in the code (compared to the code in my GIT repo) its just the HTML UTF8-Header print "Content-type: text/html; charset=utf-8\n\n"; and a few unrelated template parts. Perhaps that change somewhere is the origin for all the problems but I don't know what esp. to adjust for Postgres.

The current Perl version is 5.22.1, using Apache/2.2.22 (Ubuntu). The vhost configuration is simple:

AddHandler cgi-script .cgi .pl
ScriptAlias /...abs-path-to-app.../cgi-bin/
<Directory "/...abs-path-to-app.../cgi-bin/">
     AllowOverride None
     Options +Indexes +ExecCGI +MultiViews +SymLinksIfOwnerMatch
     <IfVersion < 2.4>
         Allow from all
     </IfVersion>
     <IfVersion >= 2.4>
         Require all granted
     </IfVersion>
     Allow from all
</Directory>

Postgres is version 9.1.24.

Edit:

Collate and Ctype is set to en_US.UTF-8, Encoding is set to UTF-8 for the database in question. Taking a look into the tables, all character varying columns use pg_catalog."default" collation. Executing show lc_collate; show already mentioned en_US.UTF-8.

Edit2:

Using the DBD::Pg flag pg_enable_utf8 and setting it to 0 seems to work out and I get the expected results. Using a value other than 0, for example '-1or1` does not work. I tried out that flag (once again) right after the database connect. Actually I have to verify this as I still do not really understand what's going on.

halfer
  • 19,824
  • 17
  • 99
  • 186
Andreas W. Wylach
  • 723
  • 2
  • 10
  • 31
  • 1
    What locales and collations are specified for the PostgreSQL database, schemas and tables? – Usagi Miyamoto Aug 11 '17 at 07:17
  • @Usagi Miyamoto: Please see my edit, hope this helps. First I thought to fiddle a bit with the field collations. But as it is UTF8 already and executing queries with the psql monitor find records using accented characters queries I am not sure if this is the right approach. – Andreas W. Wylach Aug 11 '17 at 08:00
  • http://blog.endpoint.com/2014/02/dbdpg-utf-8-perl-postgresql.html – Usagi Miyamoto Aug 11 '17 at 08:14
  • @Usagi Miyamoto: Thanks, interesting article though. But as stated in my check-list, I have DBD Pg > 3.0 installed so there shouldn't be an issue and no need with addt. flags. I also checked that test script illustrated in the article and it seems to work out. – Andreas W. Wylach Aug 11 '17 at 08:54

0 Answers0