2

I'm reading an email file where the first line in the file (so first line in the header) is:

X-RCPT-TO-LIST: 1,2,3

I'm loading it using CDO and ADODB like this:

        ADODB.Stream stream = new ADODB.Stream();
        stream.Open(Type.Missing, ADODB.ConnectModeEnum.adModeUnknown, ADODB.StreamOpenOptionsEnum.adOpenStreamUnspecified, String.Empty, string.Empty);
        stream.LoadFromFile(filename);
        stream.Flush();
        CDO.Message msg = new CDO.Message();
        msg.DataSource.OpenObject(stream, "_Stream");
        msg.DataSource.Save();

Then I'm trying to get the field like this:

ADODB.Field f = msg.Fields["urn:schemas:httpmail:X-RCPT-TO-LIST"];

Which does not work, it returns an empty field (null values).

Looking at the fields in the debugger, I see that the field name is:

urn:schemas:mailheader:ÿþx-rcpt-to-list

I assume my code might work if I look for those weird characters, but I'm worried they might change from one email to the next. Any ideas why those strange characters are added? Is there a better way to access custom header fields (without reading the file myself and parsing it)?

I'm running this test on Windows XP with all of the latest patches (SP3 I think).

Sorry if I tagged this wrong, I had trouble finding tags for this. I'm using C# if not obvious.

Here is the entire email file, I removed some junk (some for privacy reasons) but I did retest with this exact version and getting same results:

X-RCPT-TO-LIST: 1,2,3
Received: by mail-ia0-f172.google.com with SMTP id l29so4135896iag.3
        for <423a777e2af27f463b801fe2eb2242cbdf1d934000000001@users.domain.com>; Fri, 22 Mar 2013 19:52:00 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.50.195.134 with SMTP id ie6mr6320542igc.6.1364007120542;
 Fri, 22 Mar 2013 19:52:00 -0700 (PDT)
Received: by 10.50.169.39 with HTTP; Fri, 22 Mar 2013 19:52:00 -0700 (PDT)
Date: Fri, 22 Mar 2013 19:52:00 -0700
Message-ID: <XXXXXXXX63pPLB9QYu=04W3mU3Ynhkjf2bdYYZqv5oVvQ__u1vg@mail.gmail.com>
Subject: test4
From: <xxxxx2003@gmail.com>
To: 423a777e2af27f463b801fe2eb2242cbdf1d934000000001 <423a777e2af27f463b801fe2eb2242cbdf1d934000000001@users.domain.com>
Content-Type: multipart/alternative; boundary=14dae9340b45e63f6204d88ea7fa

--14dae9340b45e63f6204d88ea7fa
Content-Type: text/plain; charset=UTF-8

test4

-- 
xxxxxx@gmail.com
I don't check *this account* very often

--14dae9340b45e63f6204d88ea7fa
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">test4<br clear=3D"all"><div><br>-- <br><div><a href=3D"mai=
lto:xxxxx@gmail.com" target=3D"_blank">xxxxx@gmail.com</a></div>
<div>I don&#39;t check <b>this account</b> very often</div>
<div>=C2=A0</div>
</div></div>

--14dae9340b45e63f6204d88ea7fa--

The X-RCPT-TO-LIST line is added by code in my email server that translates the RCPT TO:<> lines to internal user IDs. That way my thread that processes these files later knows where to place the mail. I don't want to keep the info in a separate file or anything like that, as I like my current design, I just want to know why CDO/ADODB is translating my message header in to some weird name, like a mix-match of Unicode vs ASCII or something goofy.

eselk
  • 6,764
  • 7
  • 60
  • 93
  • What did you use to send that message? – NotMe Mar 25 '13 at 16:03
  • The message came from GMail. It was received on my email server, which this code is part of. I will add the entire RAW file to my question, in case it helps. – eselk Mar 25 '13 at 16:50

3 Answers3

2

"ÿþ" as first symbols of a text stream are so-called "byte order mark" most of the time. See eg. Wikipedia entry. They appear in a stream because they are in a file being read. BOM must show up if one opens a file with a hex-editor and checks its first bytes. For instance, "ÿþ" is a text representation of 0xFFFE.

Why are these symbols there in a file in the first place? It depends on how the file was created. This question may appear helpful: Can I export excel data with UTF-8 without BOM?.

Community
  • 1
  • 1
Ilya Kurnosov
  • 3,180
  • 3
  • 23
  • 37
  • Thanks for teaching me something new. With this info I should be able to come up with a better (non-hack) fix for my code. – eselk Apr 01 '13 at 15:39
1

Unless someone has a better answer, like maybe my code for loading the message has a bug in it, then I'm going to accept this as the answer...

It appears to be a bug in CDO or ADODB that does this for the first line of any message. I tested by removing my X-RCPT-TO-LIST line, so that the first line was a standard "Received:" line, and in that case the Received line had the weird characters added to the name. I also tested with several other emails with different items as the first line, and in all cases the first line always had the weird characters added to the name. I can only imagine the bug has either been fixed (I'm using XP which is pretty old), or most people using CDO haven't noticed because they don't do anything with the Received: lines and that is usually the first line in the header.

For me, to avoid the issue, I will just add an extra line to the top, so I'll have:

X-CDO-FIX: fix X-RCPT-TO-LIST: 1,2,3 ...normal header here...

Tested and works, so I'm happy. Will leave this open for a few days in case someone can provide more info that is worthy of the bounty I have started that might help someone else as well.

eselk
  • 6,764
  • 7
  • 60
  • 93
  • I've opened the email file(s) with a hex-editor also, to confirm there are no weird/extra characters at the start of my files. – eselk Mar 25 '13 at 18:03
  • Have to admit that the problem is very odd. Another "fix" would be to look for and remove that particular character prior to processing. – NotMe Mar 25 '13 at 19:03
  • It reminds me of an issue I saw years ago, when I was using telnet to test sending emails. If I made the TO: or FROM: lines the first lines in the header when sending, I recall some email client wasn't handling it correctly (acting like the field wasn't there). I don't remember all of the details now, but I imagine this could have been related, if that email client was using this version of CDO to parse them. I want to say it was Outlook that had the problem, but I'm only 60% sure on that (was a long time ago). – eselk Mar 25 '13 at 20:23
0

I share this information, just in case someone bumps into the same problem. As I did.

You open your ADODB.Stream object with all default values. This create a stream of type adTypeText with Charset set to Unicode (default behaviour).

When you LoadFromFile, the data is read from the file and, as the stream has its Charset set to Unicode, the stream data is prepended by the BOM, event if it's not present on disk.
The same arise if the Charset is UTF-8

If you ReadText your stream, you get the text (assuming the Charset is the right one), if you change its type to adTypeBinary and Read it, you see the BOM before the actual data.

Then you feed a CDO.Message with that stream. Before consuming the stream data, the CDO.Message changes its type to adTypeBinary and Read it. Hence the funny characters in front of the first line.

If you know for sure that there is no BOM in the file on disk, just change the stream Type to adTypeBinary before calling LoadFromFile

If you're not sure, read the file content as a hudge string with your favorite method, then set the stream Type to adTypeBinary and Write the string into it before giving it to the CDO.Message object (do not use WriteText, it would also prepend the data with the BOM)

Hope this makes sense and may help someone.

Peyre
  • 397
  • 2
  • 14