2

I have been racking my brain with a particular regex problem I currently have.

I want to match values that that could span multiple lines. If the data spans multiple lines it will end with as space than an underscore " _" but there may be valid text in the line that will have a space and underscore before it "This _is".

See the below text sample:

This is d_ata 1
_This is _
data 2
This _is data 3a
This _
is _
data 4

Results would be the following

Match 1

This is d_ata 1

Match 2

_This is _
data 2

Match 3

This _is data 3a

Match 4

This _
is _
data 4

I don't care about the content matching, just ensuring I get the correct end of line matches.

Edit: See Robby's negative look behind solution below.

Had tried it with some additional logic earlier which turned out to be to complicated for my regex provider to handle, simplified it and it worked a treat.

Community
  • 1
  • 1
NMGod
  • 1,937
  • 2
  • 12
  • 11
  • I don't understand the logic used to arrive at your expected output. – Tim Biegeleisen Mar 31 '17 at 05:11
  • 1
    @TimBiegeleisen the ` _` at the end of a line is a continuation character that indicates that data will continue on the next line. Like the `\` at the end of a line in bash and Java properties files. – Robby Cornelissen Mar 31 '17 at 05:16

1 Answers1

2

This PCRE expression should deliver the required result:

/^.*?(?<! _)$/gms

This is using the negative lookbehind (?<! _) in combination with the multiline flag (m) to match up to a line end that is not preceded by _. The single-line flag (s) ensures that the dot also matches newlines.

Here's a regex101 example.

Robby Cornelissen
  • 91,784
  • 22
  • 134
  • 156
  • Had this exact syntax (among many others I had tried), with some extra matching at the start of the line. Given this was working for you, I simplified some of the starting logic and it worked. Turns out the extra logic at the start was just to complicated for the regex query to be able to handle. Much appreciated. – NMGod Mar 31 '17 at 06:00