I want to separate replies and forwards from a thread of emails into conversations.
An example is like this:
On Jul 31, 2013, at 5:15 PM, John Doe wrote:
> example email text
>
>
> *From:* Me [mailto:me@gmail.com]
> *Sent:* Thursday, May 31, 2012 3:54 PM
> *To:* John Doe
> *Subject:* RE: subject
>
> example email text
>
>> Dear David,
>>
>> Greetings from Doha!
>> Kindly enlighten me. I am confused.
>>
>> With regards,
>> Smith
>>
>>> Dear Smith,
>>>
>>> Happy New year!
>>> Love
>>>
>>>> Dear Mr Wong,
>>>> Greetings!
>>>> Yours,
>>>> O
Above example is purely made up, but the format is quite true. Some emails contain multiple conversations.
I have tried https://github.com/zapier/email-reply-parser and other packages, but unfortunately they can not put into production as the performance is not stable.
The pattern is quite clear, the conversation can be separated by counting the number of ">". My initial idea is to go through the whole document, find out how many ">" are there and then extract each ">" ">>" ">>>" and ">>>>" as each conversation.
I want to know is there a better way out there?
Thank you very much!