3

I have a simple PHP web app that accepts icon images via file upload and stores them in a MEDIUMBLOB column.

On my machine (Windows) plus two Linux servers, this works fine. On a third Linux server, the inserted image is corrupted: unreadable after a SELECT, and the length of the column data as reported by the MySQL length() function is about 40% larger than the size of the uploaded file.

(Each server connects to a separate instance of MySQL.)

Of course, this leads me to think about encoding and character set issues. BLOB columns have no associated charsets, so it seems like the most likely culprit is PDO and its interpretation of the parameter value for that column.

  • I've tried using bindValue with PDO::PARAM_LOB, to no effect.
  • I've verified that the images are being received on the server correctly (i.e. am reading them post-upload with no problem), so it's definitely a DB/PDO issue.
  • I've searched for obvious configuration differences between the servers, but I'm not an expert in PHP configuration so I might have missed something.

The insert code is pretty much as follows:

$imagedata = file_get_contents($_FILES["icon"]["tmp_name"]);
$stmt = $pdo->prepare('insert into foo (theimage) values (:theimage)');
$stmt->bindValue(':theimage', $imagedata, PDO::PARAM_LOB);
$stmt->execute();

Any help will be really appreciated.

UPDATE: The default MySQL charset on the problematic server is utf8; it's latin1 on the others.

The problem is "solved" by adding PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES latin1 COLLATE latin1_general_ci" to the PDO constructor.

This seems like a bug poor design to me: why should the charset of the connection have any effect on data for a binary column, particularly when it's been identified as binary to PDO itself with PARAM_LOB?

Note that the DB tables are defined as latin1 in all cases: it's only the servers' default charsets that are inconsistent.

Ben M
  • 22,262
  • 3
  • 67
  • 71
  • Functionally, it acts the same, but you might try array-bound parameters instead of name-bound. I've never had a problem with the PDO. Here's the [class I use](https://github.com/rockerest/myframework/blob/master/backbone/Database.php), and you would probably be interested in the function `qwv($sql, $bindValues)` – rockerest Jun 14 '11 at 15:51
  • By the way, [here's a php bug](http://bugs.php.net/40913) that might have a lot to do with this issue. What version of php is the broken server on? It appears only version >5.3.4 correctly handles BLOB + PDO. – rockerest Jun 14 '11 at 15:54
  • @rockerest: By "array-bound", do you mean `$stmt->execute(array(':theimage' => $imagedata))`? If so, that's what I was using when I first saw the issue. – Ben M Jun 14 '11 at 15:54
  • @Ben M, something like that. I just prefer to use `?` and then match with array values, but I don't think that would be a big deal. – rockerest Jun 14 '11 at 15:55
  • @rockerest: That bug relates to using `bindColumn` to retrieve binary data: the problem I'm having is on insert. – Ben M Jun 14 '11 at 15:56
  • 1
    Maybe [this comment by knl@bitflop.com](http://www.php.net/manual/en/pdo.lobs.php#85897) relates to your issue. He seems to think that character encoding DOES actually have something to do with how binary data is stored. – rockerest Jun 14 '11 at 15:59
  • Can you ensure that the data is the same across all system prior you insert it into the db? – hakre Jun 14 '11 at 17:04
  • @hakre: yeah, it's all the same. – Ben M Jun 14 '11 at 17:47

1 Answers1

2

This seems like a bug to me: why should the charset of the connection have any effect on data for a binary column, particularly when it's been identified as binary to PDO itself with PARAM_LOB?

I do not think that this must be a bug. I can imagine that whenever the client talks with the server and says that the following command is in UTF-8 and the server needs it in Latin-1, then the query might get re-encoded prior parsing and execution. So this is an encoding issue for the transportation of the data. As the whole query prior parsing will get influenced by this re-encoding, the binary data for the BLOB column will get changed as well.

From the Mysql manual:

What character set should the server translate a statement to after receiving it?

For this, the server uses the character_set_connection and collation_connection system variables. It converts statements sent by the client from character_set_client to character_set_connection (except for string literals that have an introducer such as _latin1 or _utf8). collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence.

Or on the way back: Latin1 data from the store will get converted into UTF-8 because the client told the server that it prefers UTF-8 for the transportation.

The identifier for PDO itself you name looks like being something entirely different:

PDO::PARAM_LOB tells PDO to map the data as a stream, so that you can manipulate it using the PHP Streams API. (Ref)

I'm no MySQL expert but I would explain it this way. Client and server need to negotiate which charsets they are using and I assume they do this for a reason.

hakre
  • 193,403
  • 52
  • 435
  • 836
  • The docs say that about PARAM_LOB, but it can also be used for non-stream data. Your explanation makes sense, and I agree that characterizing the behavior as a bug is probably not right. I assumed that PDO and MySQL would work well enough together that data destined for binary columns would be excluded from this reinterpretation. – Ben M Jun 14 '11 at 19:18
  • Well I thought so first as well, and I do not know Mysql that specifically well that I can say where exactly the translation of the encoding of the client server communication takes place. However the docs suggest that such a translation is in action. – hakre Jun 14 '11 at 19:25
  • 1
    So how can i tell PDO to exclude my variable?to not translate my blob file? – Brownman Revival Apr 14 '15 at 10:00
  • @HogRider: What do you mean? I have problems to understand your comment in quick. Are you using a stream or just binary data in a string? BLOB should be binary and never be translated. Check in your SQL Schema with MySQL that the column is of a binary type. – hakre Apr 14 '15 at 10:49
  • The code the you told me to add worked it is now working. I mean why it worked using the construct and why it didnt work before. Can you explain why I want to understand i really dont understan now – Brownman Revival Apr 14 '15 at 11:35
  • Because of the BINARY collation the field has. Please see here in the Mysql manual: https://dev.mysql.com/doc/refman/5.0/en/charset-collation-effect.html – hakre Apr 14 '15 at 11:40