3

I may have something like this:

FIRST|[some text here] (newline)
[insert text here] (newline)
SECOND|A (newline)

FIRST|[some text here] (newline)
[insert text here] (newline)
SECOND|B (newline)

FIRST|[some text here] (newline)
[insert text here] (newline)
SECOND|A (newline)

FIRST|[some text here] (newline)
[insert text here] (newline)
SECOND|B (newline)

FIRST|[some text here] (newline)
[insert text here] (newline)
SECOND|A (newline)

I only want to capture everything from FIRST to SECOND|B and exclude anything from FIRST to SECOND|A. The order in this post is just an example and may be different with the files I am working with. The text in brackets could be words, digits, special characters, etc. (newline) is just telling you that it is on a different line. I have tried https://regex101.com/r/CwzCyz/2 (FIRST[\s\S]+SECOND\|B) but that gives me from the first FIRST to the last SECOND|B This works in regex101.com but not in my PowerShell ISE application, which I am guessing is because I have the flavor set to PCRE(PHP).

codewario
  • 19,553
  • 20
  • 90
  • 159
user401
  • 53
  • 5
  • Use `-Raw` when getting file content and then use `(?s)FIRST.*?SECOND\|B` – Wiktor Stribiżew Nov 27 '19 at 18:23
  • That doesn't quite work, because `SECOND|A` will still be returned as part of the match until `SECOND|B` is encountered. – codewario Nov 27 '19 at 18:29
  • Are `A` and `B` really the letters A & B or do they represent something else. Could we find `SECOND|C` or `SECOND|Z` or something else.? – Toto Nov 27 '19 at 19:26

2 Answers2

1

FIRST\|(?:(?!SECOND\|[^B])[\S\s])*?SECOND\|B

will not match the FIRST| associated with the SECOND|A (or any non-B)

https://regex101.com/r/e0CG9B/1

Expanded

 FIRST \| 
 (?:
      (?! SECOND \| [^B] )
      [\S\s] 
 )*?
 SECOND \| B

If there is a need for the absolute inner FIRST / SECOND that has to be done a different way :

FIRST\|(?:(?!(?:FIRST|SECOND)\|)[\S\s])*SECOND\|B

https://regex101.com/r/qoT8U1/1

1

If FIRST is at the start of the line and SECOND|A or SECOND|B is at the start of the line you could match all following lines that do not start with SECOND\|[AB]

^FIRST.*(?:\r?\n(?!SECOND\|[AB]\b).*)\r?\nSECOND\|B\b.*

In parts

  • ^FIRST.* Start of the line
  • (?: Non capturing group
    • \r?\n(?!SECOND\|[AB]\b) Match a newline, assert not starting with the SECOND part
    • .* Match 0+ times any char except a newline
  • ) Close non capturing group
  • \r?\n Match a newline
  • SECOND\|B\b.* Match the line that starts with SECOND|B

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70