The Problem
Any Gmail message can be encapsulated as a single raw file. My assumption is that such raw file would contain anything needed to properly display the email along with all of its indigents.
I was looking for a way to process such file programmatically. There are two approaches of processing Gmail messages:
Interfacing with the Gmail server via Gmail API. Doing so require authentication followed by a HTTP / HTTPS interaction as explained in the Gmail API documentation.
Statically parsing the raw data, extracting from it all elements which sums up to an entire email message. These may include:
- Email's attributes (sender's name, sender's email, date, subject, etc.)
- Body (usually an HTML one, which may include embedded images and other files, which are required for the HTML file to be properly displayed).
- Attachments.
My question:
How to statically parse such Gmail's message raw data without any need to interact with the Gmail server / API, but just by using a MIME parses like this one, and on top of it, add any code required to find and extract any Gmail specific as listed above.
What I wrote so far:
I have started parsing the raw data (stored in szMailBody): (using this parser).
LPCSTR szMailId, LPCSTR szMailBody;
MIMELIB::CONTENT c;
while ((*szMailBody == ' ') || (*szMailBody == '\r') || (*szMailBody == '\n'))
{
szMailBody++;
}
char deli[] = "<pre class=\"raw_message_text\" id=\"raw_message_text\">";
szMailBody = strstr(szMailBody, deli);
szMailBody += strlen(deli);
if (c.Parse(szMailBody) != MIMELIB::MIMEERR::OK)
return;
// Get some headers
auto senderHdr = c.hval("From");
auto dateHdr = c.hval("Date");
auto subjectHdr = c.hval("Subject");
auto a1 = c.hval("Content-Type", "boundary");
// Not a multi-part mail if empty
// Then use c.Decode() to get and decode the single part body
if (a1.empty())
return;
auto a2 = c.hval("_NextPart_000_0046_01D38959.20888970");
if (a2.empty())
return;
// _NextPart_000_0046_01D38959.20888970
vector<MIMELIB::CONTENT> Contents;
MIMELIB::ParseMultipleContent2(szMailBody,strlen(szMailBody), a2.c_str(), Contents);
My question is different than this one, because Gmail raw data is complex enough to require further steps to take, even when the user is familiar with MIME parsing. There is more complexity extracting attachments into separate files (for example), or restoring the email's body, as an HTML file, along with its dependencies (such as embedded images). The technique for processing Gmail raw data requires a layer of instructions on top of MIME parsing.