0

I have the following as my Input,

Input

Random Line 1
Random Line 2
From: person1@example.com
Date: 01-01-2011
To: friend@example.com
   Subject: One
Random Line 3
Random Line 4
From: person2@example.com
   Subject: Two
Random Line 5
From: person3@example.com
   Subject: Three
This is the end

The following is my expected matched text,

Expected Output

From: person2@example.com
   Subject: Two

Note: There may be zero or multiple lines in between From: person2@example.com and Subject: Two

I tried with the regular expression,

/(From.*?Subject:\s*Two)/m

The above regex matches from the first From. Can anyone help me in matching the expected output? Thanks in advance.

MIZ
  • 385
  • 1
  • 14
  • If there are lines between `"From: person2@example.com"` and `"Subject: Two"`, do you want those lines returned (as well as `"From:..\n"` and `"Subject: Two"`)? – Cary Swoveland Jul 08 '14 at 20:40

4 Answers4

3

Add .* before your regex to get only the expected two lines.

.*(From.*?Subject:\s*Two)

Because of greedy quantifier *, regex engine matches upto the last From string(ie, the one before the line which contains the string Two). Then from the string From upto the string Two is captured into a group(Non-greedy quantifier is used. so it do a shortest match).

DEMO

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • @Mohamed It works if you have many lines in-between From: person2@example.com and Subject: Two. – Avinash Raj Jul 08 '14 at 10:27
  • If you are returning everything between (the required) `"From"` and `"Subject:\s*Two"`, I don't see the need for the non-greedy modifier (assuming there is at most one line containing `"Subject: Two"`, and if we don't know if that's true the question is unclear). – Cary Swoveland Jul 09 '14 at 05:29
  • @CarySwoveland you mean this `.*(From.*Subject:\s*Two)`? – Avinash Raj Jul 09 '14 at 05:32
2

Make sure you have only one newline between the first and second line:

/(From[^\n]*\n\s*Subject:\s*Two)/m

see here

Also, I believe that removing the /m will make it even easier:

/(From.*?\s*Subject:\s*Two)/

see here

If you might have lines in the middle, you need to use negative lookahead:

/(From[^\n]*\n(^(?!From)[^\n]*\s*)*Subject:\s*Two)/m

see here

This regex does the following:

  1. From[^\n]*\n - matches a text starting with From up to the end of the line
  2. (^(?!From)[^\n]*\s*)* - matches zero or more lines not beginning with From (negative lookahead)
  3. Subject:\s*Two - matches a text containing Subject: [whitespace] Two
Uri Agassi
  • 36,848
  • 14
  • 76
  • 93
2

This is another way:

Code

text.scan(/.*(From:.*?\n).*(Subject: Two)/m).join

Example

text = <<_
Line 1
From: person1@example.com
To: friend@example.com
   Subject: One
Line 5
From: person2@example.com
Line 7
   Subject: Two
Line 9
From: person3@example.com
   Subject: Three
The End
_

text.scan(/.*(From:.*?\n).*(Subject: Two)/m).join
  #=> "From: person2@example.com\nSubject: Two"

Explanation

The regex

r = /.*(From:.*?\n).*(Subject: Two)/m

skips all characters until it reaches the last string "From:...\n" that is followed (after some non-matching characters) by the string "Subject Two". Specifically:

  • .*, being greedy, consumes as many characters as it can, including lines"From:...\n" that do not match the regex, up to the beginning of the first capture group.
  • (From:.*?\n) is the first capture group, capturing "From: to the end of that line. ? in .*? makes .* non-greedy, so that it stops at the first \n it reaches.
  • .* consumes all following characters until it reaches the second capture group.
  • (Subject: Two) is the second capture group.
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
1

Use this:

if subject =~ /^From[^\r\n]*\s*\S*Subject: Two/
    match = $&
else
    match = ""
end

Explanation

  • The ^ anchor asserts that we are at the beginning of a line
  • From matches literal chars
  • [^\r\n]* matches any chars that are not line breaks
  • \s* matches any whitespace, including line breaks
  • \S* matches any non-whitepace chars
  • Subject: Two matches literal chars

Multi-Line Version

In response to your comment and new note, here is another version that will allow multiple lines between the From and the Two:

if subject =~ /^From(?:(?:(?!^From).)*+\s*+)*\S*Subject: Two/
    match = $&
else
    match = ""
end
zx81
  • 41,100
  • 9
  • 89
  • 105
  • FYI, added explanation. :) – zx81 Jul 08 '14 at 09:34
  • Thanks for the solution. Added a Note in the question. – MIZ Jul 08 '14 at 10:05
  • You're welcome, Mohamed. Added a Multiple Line version at the bottom of the answer, let me know if that's what you want. :) – zx81 Jul 08 '14 at 11:54
  • zx, when I apply the multi-line version to `text` in my answer, I get `#=> "From: person2@example.com\nLine 7\n Subject: Two"`. btw, you could also write that, `text[/^From(?:(?:(?!^From).)*+\s*+)*\S*Subject: Two/] || ''`. – Cary Swoveland Jul 08 '14 at 19:11
  • @CarySwoveland Yes, that's what he wants. :) The idea of the multi-line version is that there can be multiple carriage returns between the `From` and the `Two`. – zx81 Jul 08 '14 at 20:16
  • Mohamed, you asked for the multiple line version. Did you try it, and did it work? – zx81 Jul 08 '14 at 20:16
  • I've asked for clarification. – Cary Swoveland Jul 08 '14 at 20:41