11

I'm working on an email piping script that needs to save just the reply content and not the original quoted email. I'm using a mime parser class (http://www.phpclasses.org/package/3169-PHP-Decode-MIME-e-mail-messages.html) to get all the information that I need from the email:

Message ID: AANLkTimYRxMJwjLSdcDP5ksM=xxx@mail.gmail.com
Reply ID: 20110316205225.xxx@example.com

Subject: Re: MessageX
To:  q1-1234567890@example.com
From: Someone someone@someothersite.com

Body: Hello,
Blah Blah Blah
-Someone

On Wed, Mar 16, 2011 at 3:52 PM,  <q1-1234567890@example.com> wrote:
> Hello,
>
> Some other blah, blah, blah.
>
> Thank you,
> Me

In the body section, I'm getting the original quoted email. How can I filter this out? I know email clients often add ">" next to quoted content, but I'm not sure if this would be good enough. Thanks for your help.

davishmcclurg
  • 409
  • 4
  • 14
  • 4
    It sounds a little like you are doing some sort of customer support type of email reply into a system thing. I've often seen something like a string of "=============REPLY ABOVE THIS LINE==================" in the original email to the "customer" which can then easily be found and will cut out *all* of the reply quotes. This obviously may not be what you are trying to do at all, but it might also be a valid option for you. – Blair McMillan Mar 16 '11 at 22:21

1 Answers1

11

This might be doable with a regular expression. Try:

$text = preg_replace('#(^\w.+:\n)?(^>.*(\n|$))+#mi', "", $text);
mario
  • 144,265
  • 20
  • 237
  • 291
  • Thanks for the quick answer, it's working well so far. Are there any cases where this might not work? – davishmcclurg Mar 16 '11 at 22:25
  • There are certainly email clients which allow or use `>` verbatim in mails. Or if you email source code or diffs etc. To make it a bit more resilient, you could exchange the last `+` for `{2,}` in the regex. With that it will only match at least two consecutive `>` lines, which is a sure sign that it's a quoted part. – mario Mar 16 '11 at 22:30
  • There wont ever be a perfect solution to this issue. Its possible (although unlikely) to have content in the email that appears to be the replied-to section. Consider lines 10 to 13 of [this snippet](http://pastebin.com/ZyJvBMbn) (I modified your original example). In all likelihood, those lines would end up being removed. – Mr. Llama Mar 16 '11 at 22:46
  • Alright, I think it's working pretty well. One question: what do the # symbols do in the regex? I couldn't find any information on it. Thanks a lot for your help. – davishmcclurg Mar 16 '11 at 23:04
  • 2
    @davishmcclurg, The `#` are just alternative regex delimiters. Using `#...*#` is equivalent to `/...*/` – mario Mar 16 '11 at 23:06
  • @mario This works perfectly, but can you show more reg ex for different and popular email clients? like ... yahoo, rediff, outlook etc.. – TechCare99 Apr 21 '12 at 11:53