1

I'm retrieving raw text (includes header, and message) from a POP server. I need to capture everything after the header which is terminated by a blank line between it and the user message.

At the same time I'm wanting to ignore anything from original messages if it's a reply. The start of a reply for the emails I'm parsing start with

------Original Message------

An example email might look like this

Return-Path: ...
...
More Email Metadata: ...

Hello from regex land, I'm glad to hear from you.
------Original Message------
Metadata: ...
...

Hey regex dude, can you help me? Thanks!

Sincerely, Me.

I need to extract "Hello from regex land, I'm glad to hear from you." and any other text/lines prior to the original message.

I'm using this regex right now (C# in multiline mode)and it seems to work except it's capturing ------Original Message------ if the body is blank. I'd rather just have a blank string instead.

^\s*$\n(.*)(\n------Original Message------)?

Edit
I haven't down voted anyone and if you happen to downvote, it's usually helpful to include comments.

Jeff LaFay
  • 12,882
  • 13
  • 71
  • 101

3 Answers3

0

The reason for this is that you have an extra \n inside the parenthesis. If the body is blank, there is no extra newline there. Therefore, try this:

^\s*$\r\n(.*)(^------Original Message------$)?

If you don’t want the newline at the end of the body, you can still use string.Trim() on the matched part.

Note: This assumes that the input uses \r\n line terminators (which is required in e-mail headers according to the MIME standard).

Timwi
  • 65,159
  • 33
  • 165
  • 230
-1

Why don't you not use DotnetOpenMail? Using a regex to do this is a wrong approach, you'd be better off using a dedicated email handler instead....

t0mm13b
  • 34,087
  • 8
  • 78
  • 110
  • I'm using a POP3 client that I was told to use and instead of retrieving messages as objects (as I would prefer), I can only retrieve raw text for each message. Otherwise this wouldn't be an issue. – Jeff LaFay Sep 08 '10 at 14:37
  • Uhhh... that does not really make sense using regex for this... what pop3 client are you using - that pop3 client should be taking care of the handling of the body of the message etc... otherwise regex would not be needed!! – t0mm13b Sep 08 '10 at 14:45
  • Thanks for trying to help tommie. Let's put it in this perspective then. I have PO3 mail client code and I'm extending it to instantiate a MailMessage object for each message retrieved from the POP server. Now I'm writing methods to extract portions of the raw text to hydrate the object properties. – Jeff LaFay Sep 08 '10 at 14:46
  • And I agree.. all of this wouldn't be needed if that were the case :) – Jeff LaFay Sep 08 '10 at 14:49
  • tommie, I think I may be asking for too much in a regex capture. I'm going to try out DotnetOpenMail. Thanks for pointing me in the right direction. – Jeff LaFay Sep 08 '10 at 15:08
-1

You need to replace (\n------Original Message------) with (?=(\n------Original Message------)) lookahead to not return that part, just to ensure it's there

El Ronnoco
  • 11,753
  • 5
  • 38
  • 65
  • This is better. The problem is that it doesn't account for emails that don't contain "Original Message". Much closer though, thanks. – Jeff LaFay Sep 08 '10 at 14:46
  • What are the alternative terminators other than `original message` ? – El Ronnoco Sep 08 '10 at 14:48
  • I just want it to stop capturing before original message line. Not all emails will have that line, just most of them do. So if that line doesn't exist it's a new email and not a reply. I want all of that captured. – Jeff LaFay Sep 08 '10 at 14:54
  • Who gave me a downvote and what's the reason?! Perhaps try `(?=(\n------Original Message------|$))` which should take you to the end of the message. – El Ronnoco Sep 08 '10 at 15:42