0

I have written several PHP scripts to read the contents of a database and output those contents in an email message. Every once in a while, I will see a SPACE (0x20) character embedded in the output where there shouldn't be any. For example, in one script, I reference a PHP global variable containing exactly "n" non-space characters, and sometimes (not always), when that variable is dumped to an email message, the string will appear with an embedded blank (making the total length of the string "n+1"). Other times, an HTML tag (such as <BR>) will appear as < BR> (note the SPACE before the "B").

Because the behavior of the script is not consistent (some emails are affected, and others aren't), I can't seem to find the problem.

I am enclosing a link to the PHP script that is occasionally embedding a space into the BREAK tag. I have removed the lines that provide specific login information to the databases. Otherwise, everything else is intact. In the code file you can find at the link below, line 281 is the one that contained the BREAK command with the embedded SPACE (as described above). This has happened only once!

http://jem-software.com/temptest.txt

I guess the only other potentially relevant information is that this script file is taken from code entered into a JUMI code block contained within a Joomla! based website.

Edit 1:

Thank you, Riccardo, for your suggestions. Here is some more clarification:

  1. I am not reading an email and parsing the results in order to insert into a database. Just the opposite, I am reading from a database and using the results to create an email. I will check the database to see what character set was used, and explicitly pass the character set to see if that makes a difference.

  2. I don't use Joomla functions to access the database because the database I am referencing is external to the Joomla! environment. It is a pre-existing database created from PHP scripts written several years prior. When my old website was re-written using Joomla, I wanted to "port" the PHP database access code intact, so I installed the JUMI plugin to make this possible.

  3. I will check out the character coding in the database and synchronize it to the character code of the email message.

  4. I don't understand how an issue with character coding would result in the insertion of a SPACE into the hard-coded HTML tag - this tag did not come from any database, but was typed into the email as a literal string.

JEfromCanada
  • 111
  • 8

1 Answers1

0

This is a strange issue, but here are my two cents:

The first is you're not using Joomla functions to access the db and the mail subsystem. While this could work, it's not really nice.

The second is, this smells like a character set / codepage issue.

Here are a few considerations on the character set issue:

I read your code quickly, and I didn't notice anything wrong. But Joomla uses UTF-8, and your queries don't specify it (mysql_set_charset() is missing!) which could be a first issue.

The second is that the emails you read will have different character sets, depending on the senders' settings. Make sure you handle the codepage issues properly: the following is a snippet of a function I use for parsing email:

$mime = imap_fetchmime($this->connection, $this->messageNumber, $partNumber);
return $this->decodeMailBody($data,$mime); // QUOTED_PRINTABLE

function decodeMailBody($string,$mime) {
    $str = quoted_printable_decode($string);

    echo "<h3>mime: $mime; charset $charset</h3>";
    //mime: Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8
    //mime: Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252
    $mimes = explode('charset=',$mime);
    foreach($mimes as $mimepiece) {
        $charset = $mimepiece;
    }       
    $charset = strtolower(trim($charset));
    if ($charset == 'utf-8') {
        return $str;
    } else {
        return iconv($charset, 'UTF-8', $str);
    }
}

Last, make sure you use utf-8 when you insert the mail into the db after parsing it.

Riccardo Zorn
  • 5,590
  • 1
  • 20
  • 36