0

The stack:

a simple form with a textfield=> posted to an apache web server running on unix => processed by php 5.3 => saved in mysql5 table with the latin1 character set

Than, retrieved from the DB and shown to the user.

Result: when the user entered the ® symbol into the form, it's saved into the db as two symbols instead of one and then appears wrong in the ZPL printer

By the way, it doesn't happen when the same runs on the development machine which runs Windows rather than Unix

What am I doing wrong?

edit: I cannot change the character set of the mysql table, there is a company standard for latin1 which I don't appreciate but it is still the case

shealtiel
  • 8,020
  • 18
  • 50
  • 82

2 Answers2

1

character encoding plays a rule at multiple places in your scenario

  • which encoding does the client use for sending the parameters to the server
  • which encoding does the mysql server use to intepret characters send from the php script to the server
  • which encoding does the mysql server use to send results back to the php script
  • which encoding does the mysql server use to store the data
  • which encoding does the client/browser use to render the data send from the php script to the client/browser

It's simplest to keep the whole chain at utf-8 encoding:
Set the character encoding of the html documents you send to the browser to utf-8, e.g. via default_charset.
Indicate to the mysql server that the connection charset is utf-8, e.g. via setting the charset property in the PDO-MYSQL DSN.
Create the table using utf-8 as default charset and without specifiying another encoding for the field itself.

Self-contained example:

<?php
ini_set('default_charset', 'utf-8');
$fn = isset($_POST['val']) ? 'doDatabase' : 'doForm';
doHtml($fn);


function doHtml($fn) { ?>
<html>
    <head><title>...</title></head>
    <body>
        <?php $fn(); ?>
    </body>
</html>
<?php
}

function doForm() { ?>
    <form method="POST" action="?">
        <div>
            <input type="text" name="val" value="&reg;" />
            <input type="submit" />
        </div>
    </form>
<?php       
}

function doDatabase() {
    $pdo = new PDO('mysql:host=localhost;dbname=test;charset=utf8', 'localonly', 'localonly');
    $pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
    $pdo->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);

    $pdo->exec('
        CREATE TEMPORARY TABLE soFoo (
            id int auto_increment,
            val varchar(30),
            primary key(id)
        ) DEFAULT CHARACTER SET utf8
    ');

    $stmt   = $pdo->prepare('INSERT INTO soFoo (val) VALUES (?)');
    $stmt->execute( array($_POST['val']) );
    $stmt = null;

    foreach( $pdo->query('SELECT * FROM soFoo', PDO::FETCH_ASSOC) as $row ) {
        echo 'row: ', join(', ', $row), "<br />\r\n";
    }
}
VolkerK
  • 95,432
  • 20
  • 163
  • 226
  • Awesome, I hoped for this kind of answer. Now, didn't you skip the character sets of 1. php 2. the os ? (you included php in the code sample but not in the explanation – shealtiel Apr 24 '13 at 11:56
  • I doubt that I will be able to change the character set of the table, not my decision unfortunately – shealtiel Apr 24 '13 at 11:58
  • And. I would really like to understand one day the whole topic of character sets, across all the software stack that I use. Could you recommend a reading? – shealtiel Apr 24 '13 at 12:00
  • The php core itself currently has no notion of character encoding, The os doesn't play into the handling at this scope. The important part is a) indicate to the client/browser that all input/output is in utf-8 b) let the mysql server know that all input/output is done in utf-8. a) is realized by setting the default_charset directive which lets php send an http header containg charset=utf-8. b) is realized by setting the charset=utf8 porpert in the DNS when creating the PDO object. see https://dev.mysql.com/doc/refman/5.0/en/charset-connection.html – VolkerK Apr 24 '13 at 12:47
  • As long as the character set of the field in the mysql table has an representation for ® you can use that charset (mysql will do the conversion automagically). If there is no representation of that symbol, mysql cannot store it. It would still be possible to do some conversions on your own but I wouldn't recommend that. – VolkerK Apr 24 '13 at 12:48
  • the sign is part of the character set, but in fact it's stored with an additional character before it. This happens only on unix, on windows it's being stored correctly – shealtiel Apr 24 '13 at 12:58
0

You should try mysql5 table with the utf8_general_ci collation

Shu
  • 11
  • 1
  • utf8 contain everything you need and 99% of webpage use it (aka ) For example if your page use UTF8 and your database in latin_* you must use echo utf8_encode($field); anytime. – Shu Apr 24 '13 at 11:34