0

I converted my MySQL database and my perl scripts to UTF-8. And finally it worked, but it is somehow a crazy solution. That's what I did:

MYSQL DATABASE: Completely set to UTF8

PERL-SCRIPTS: Source code converted to UTF8 encoding. "use utf8;" at the top.

HTML-HEADERS:

  print "Content-type: text/html\n\n";

and

  &ltMETA CHARSET='UTF-8'>

DATABASE-CONNECTS:

  $dbh->{'mysql_enable_utf8'} = 1;

and

  "set names 'utf8';"

Now, everything works with cyrillic (russian) characters: input, output and processing in the database, fine. But the problem I have are German "Umlaute": äöü. They are not shown correctly in the browser. They work only if I put a cyrillic character in a comment next to the HTML element which does not show the character, e.g. something like this:

  &lt!-- Э -->

This is an awkward solutions, and I know that there has to be a solution without that. Does anybody know what could be missing. Thanks in advance for every answer!

UPDATE:

Thank you for you response. I figured out, that I have the problem even with the most simple HTML file. I use this source code:

#!/usr/bin/perl
use utf8;
print "Content-Type: text/html; charset=utf-8\n\n";
print <<END;
    <HTML>
        <HEAD>
            <META CHARSET='UTF-8'>
        </HEAD>
        <BODY>
            <H1>The Country Österreich</H1>
        </BODY>
    </HTML>
END

The result can be seen at: http://5mls.com/test_bad.cgi
As you can see, the "Ö" is not shown.
Now the code that works:

#!/usr/bin/perl
use utf8;
print "Content-Type: text/html; charset=utf-8\n\n";
print <<END;
    <HTML>
        <HEAD>
            <META CHARSET='UTF-8'>
        </HEAD>
        <BODY>
            <H1>The Country Österreich<!-- Э --></H1>
        </BODY>
    </HTML>
END

The result can be seen at: http://5mls.com/test_good.cgi
This time the "Ö" is shown correctly, because of the Russian character "Э" in the comment. Does anybody know, how the "Ö" could be shown without the Russian character?

  • 1
    Send utf-8 in the header too: `print "Content-type: text/html charset=utf-8\n\n";` http header will be considered first. – VMai Jul 06 '14 at 14:21
  • 1
    You need to figure out where the problem is. 1) Do you have phpMyAdmin? If so, do you see the correct string in the database when using phpMyAdmin? I know that program reliably displays what's actually in the database. – ikegami Jul 06 '14 at 15:54
  • 1
    2) Do you correctly fecth the data from the database? Provide the output of `sprintf("U+%v04X", $str)` and what you expect that string to be. – ikegami Jul 06 '14 at 15:55
  • 1
    3) Do you correctly encode the output? If possible, use `wget` or `curl` to fetch the HTML page. Use a hex editor to examine the bytes of the string and provide those. – ikegami Jul 06 '14 at 15:56
  • I would look at the http headers first. – VMai Jul 06 '14 at 16:49
  • Thank you for your answers. I now use the header – user3809670 Jul 08 '14 at 15:16
  • I now use the header
    print "Content-Type: text/html; charset=\"UTF-8\"\n\n";
    , but it does not change the problem. I know the data is correct in the database. I checked that with PUTTY. I can start MYSQL in PUTTY, everything is shown correctly there. And when I use my trick with the
    it completely works. Loading and saving UTF characters repeatedly does not change them. Only when I leave the weird comment away, the HTML page is not shown correctly.
    – user3809670 Jul 08 '14 at 15:18

1 Answers1

0

I also had issues using utf8 working with perl scripts and MySQL.

Check what your cgi script sends to the browser by using a web developer tool in your browser. Then try out encoding what perl sends to your browser by using the Encode Module:

use Encode qw(encode);

and encode your output:

my $austria = "<HTML>
    <HEAD>
        <META CHARSET='UTF-8'>
    </HEAD>
    <BODY>
        <H1>The Country Österreich</H1>
    </BODY>
</HTML>";
print encode('utf-8', $austria);

Besides, your test_bad.cgi script works for me and puts out the correct output in my browser.