166

I have a text file that denotes remarks with a single '.

Some lines have two quotes but I need to get everything from the first instance of a ' and the line feed.

I AL01                  ' A-LINE                            '091398 GDK 33394178    
         402922 0831850 '                                   '091398 GDK 33394179    
I AL02                  ' A-LINE                            '091398 GDK 33394180    
         400722 0833118 '                                   '091398 GDK 33394181    
I A10A                  ' A-LINE 102                       '  53198 DJ  33394182    
         395335 0832203 '                                  '  53198 DJ  33394183    
I A10B                  ' A-LINE 102                       '  53198 DJ  3339418
ᔕᖺᘎᕊ
  • 2,971
  • 3
  • 23
  • 38
user38349
  • 2,945
  • 9
  • 36
  • 47

7 Answers7

232
'.*

I believe you need the option, Multiline.

Joshua Belden
  • 10,273
  • 8
  • 40
  • 56
126

The appropriate regex would be the ' char followed by any number of any chars [including zero chars] ending with an end of string/line token:

'.*$

And if you wanted to capture everything after the ' char but not include it in the output, you would use:

(?<=').*$

This basically says give me all characters that follow the ' char until the end of the line.

Edit: It has been noted that $ is implicit when using .* and therefore not strictly required, therefore the pattern:

'.* 

is technically correct, however it is clearer to be specific and avoid confusion for later code maintenance, hence my use of the $. It is my belief that it is always better to declare explicit behaviour than rely on implicit behaviour in situations where clarity could be questioned.

BenAlabaster
  • 39,070
  • 21
  • 110
  • 151
  • 2
    The $ is unnecessary. The dot will stop at the end of the line under normal circumstances. – Tomalak May 06 '09 at 18:00
  • 10
    unnecessary - but proper for what he wants to do. It serves as a reminder later that it is expecting everything from ' to the end of the line – gnarf May 06 '09 at 18:03
  • @balabaster: I did not say that it was wrong. ;-) It was just a footnote. – Tomalak May 06 '09 at 18:09
  • @Tomalak: Wasn't trying to imply you were wrong by any means, was just clarifying my reasoning for my choice of using $ rather than not. Thank you for pointing it out. – BenAlabaster May 06 '09 at 18:10
  • +1 for including how to include everything after the character in question, instead of always including it. – grizzasd Oct 07 '19 at 18:30
  • (?<=').*$ is actually correct answer to what op is asking, capture after, not with. This should be accepted answer – Aistis Taraskevicius Feb 26 '20 at 11:09
32
'.*$

Starting with a single quote ('), match any character (.) zero or more times (*) until the end of the line ($).

OtherDevOpsGene
  • 7,302
  • 2
  • 31
  • 46
  • This answer is a great example of how to break down the logic behind what a command, nice and clear! – Timmah Aug 26 '19 at 06:40
19

When I tried '.* in windows (Notepad ++) it would match everything after first ' until end of last line.

To capture everything until end of that line I typed the following:

'.*?\n

This would only capture everything from ' until end of that line.

Danish
  • 191
  • 1
  • 2
12

In your example I'd go for the following pattern:

'([^\n]+)$

use multiline and global options to match all occurences.

To include the linefeed in the match you could use:

'[^\n]+\n

But this might miss the last line if it has no linefeed.

For a single line, if you don't need to match the linefeed I'd prefer to use:

'[^$]+$
Gess
  • 459
  • 6
  • 15
  • 1
    Had trouble with this suggestion with golang's regex. `'[^\n]+` was needed instead of `'[^\n]+$`. See https://play.golang.org/p/EemihqdIMSl – jws Sep 16 '21 at 14:13
5

This will capture everything up to the ' in backreference 1 - and everything after the ' in backreference 2. You may need to escape the apostrophes though depending on language (\')

/^([^']*)'?(.*)$/

Quick modification: if the line doesn't have an ' - backreference 1 should still catch the whole line.

^ - start of string
([^']*) - capture any number of not ' characters
'? - match the ' 0 or 1 time
(.*) - capture any number of characters
$ - end of string
gnarf
  • 105,192
  • 25
  • 127
  • 161
0

https://regex101.com/r/Jjc2xR/1

/(\w*\(Hex\): w*)(.*?)(?= |$)/gm

I'm sure this one works, it will capture de hexa serial in the badly structured text multilined bellow

     Space Reservation: disabled
         Serial Number: wCVt1]IlvQWv
   Serial Number (Hex): 77435674315d496c76515776
               Comment: new comment

I'm a eternal newbie in regex but I'll try explain this one

(\w*(Hex): w*) : Find text in line where string contains "Hex: "

(.*?) This is the second captured text and means everything after

(?= |$) create a limit that is the space between = and the |

So with the second group, you will have the value

Xavius Pupuss
  • 160
  • 1
  • 6