10

My application receives email from users. A response from gmail, for example, comes in like this:

This is some new text

On Sun, Apr 1, 2012 at 3:32 AM, My app <
4f77ed3860c258a567aeabf8@myapp.com> wrote:

> Original...
> message..

Of course, this treatment varies from client to client.

Right now I am identifying the '4f77ed3860c258a567aeabf8' and throwing out everything after, because I know what email address they've sent to. This is not a general solution but works for my purposes, except for when there's a line break in the "Original message" line, like in the above example.

Is there a better, standard way to strip out past message's from a user's reply to an email?

ty.
  • 10,924
  • 9
  • 52
  • 71

3 Answers3

6

There is an npm module called emailreplyparser, which is ported from a github ruby library, which does this. As you point out, the formats used for this are not standard and thus any solution is going to be pretty fragile and imperfect but whaddayagonnado?

Here's an example where I take a JSON response I got from the new Gmail API and successfully access just the new reply text of a given message.

var erp = require('emailreplyparser').EmailReplyParser.read;
var message = require('./sample_message.json');
var buffer = new Buffer(message.payload.parts[0].body.data, 'base64');
var body = buffer.toString();
//body is the whole message, the new text and the quoted reply portion
// console.log(body);
var parsed = erp(body);
//this has just the text of the reply itself
console.log(parsed.fragments[0].content);

Note there may be several interesting fragments if the author interleaved reply text and quoted message fragments.

Peter Lyons
  • 142,938
  • 30
  • 279
  • 274
4

If you want a 100% way to remove anything except the most recent post, compare each character from the new message and the previous one. If you don't want to write your own diff parser, check out this lib.

https://github.com/cemerick/jsdifflib

Or if you want a lightweight algo check this one out

http://ejohn.org/projects/javascript-diff-algorithm/

FlavorScape
  • 13,301
  • 12
  • 75
  • 117
  • The problem with this is that a diff will incorrectly mark "On Sun, Apr 1...XX wrote:" as part of the new message. It seems like the only solution may just be to learn how each client (gmail, outlook, etc.) responds. – ty. Apr 02 '12 at 20:20
  • I would posit that most providers would always put this on a newline. Cant you just do the diff than delete the line between the last linebreak and the next to last? so, your example, is that actually multi-line or just how it pasted? – FlavorScape Apr 02 '12 at 21:54
  • It's actually multi-line in the example I posted. My users also have a habit of not preserving the newline between their message and the provider line. I think I can come up with a couple heuristics as I gather the "original message" strings of each client... – ty. Apr 03 '12 at 00:21
  • wow, yeah that's a really annoying problem. I can even imagine different versions of outlook doing it differently. sorry you have to deal with that! – FlavorScape Apr 03 '12 at 22:42
1

please check my code i think it cover all the cases as the repo contains un handled case if there is more than one reply in the message and the (On < Date > < Email > wrote:) line is split between more than one line it work wrong and include this line (On < Date > < Email > wrote:) with it as a part of the reply

function getReplyOnly(str){
  str = str || '';
  var exp = /^(>)*\s*(On\s(\n|.)*wrote:)/m;
  var exp2 = /(\s|.|\n)*((wrote:)$)/m;
  var exp3 = /^((\s)*(On))/m;

  var arr = str.split('\n');
  var msg = '';

  var foundEndWrote = false;
  var foundStartOn = false;
  var indexes = [];
  var tempStr = '';

  for(var i = arr.length - 1; i >= 0; i--){
    tempStr = arr[i] + tempStr;
    if(exp2.test(arr[i])){
      foundEndWrote = true;
    }

    if(exp2.test(arr[i])){
      foundStartOn = true;
    }

    indexes.push(i);
    if(exp.test(tempStr) && foundEndWrote && foundStartOn){
      clear();
    }
  }

  function clear(){
    tempStr = '';
    indexes = [];
    foundEndWrote = false;
    foundStartOn = false;
  }

  // create the message
  for(var i = indexes.length - 1; i >= 0; i--){
    msg += ('\n' + arr[indexes[i]]);
  }
  return msg;
}
Mostafa Ahmed
  • 661
  • 5
  • 7