4

I have a requirement to extract the number of characters before and after REGEX match. For example:
Input : ABCDEFGHIJK//MNOPQRST Output : IJK//MNOPQ

Input : zzzABCDEFGHIJK//MNOPQRST Output :

I want only first 3 characters before "//" and 5 characters after "//". Also exclude line that starts with zzz.

The code currently I am using to search // :

^(?!.*zzz)?=.{0,3}//.{0,100}[a-zA-Z0-9])(?=\S+$).{2,5000} --- Not working
(?=.{0,3}//.{0,100}[a-zA-Z0-9])(?=\S+$).{2,5000} --- Working

https://regex101.com/r/ry6Y09/1 --- Regex demo

I need to specify limit.

  • 3
    Do you just want `.{0,3}//.{0,5}`? See https://regex101.com/r/hMM7dV/1 – Wiktor Stribiżew Mar 22 '21 at 14:07
  • 1
    how about `.{3}\/\/.{5}` – Chris Doyle Mar 22 '21 at 14:08
  • @WiktorStribiżew Seems like just `.{3}//.{5}` would suffice based on OP's post but clarification from OP is desirable. – MonkeyZeus Mar 22 '21 at 14:08
  • Thank you, it is working but with one issue. I also have to exclude the line that starts with zzz, this part is already using in my current code but now stopped working after putting characters limit. ^(?!.*zzz)?=.{3}\/\/.{0,100}[a-zA-Z0-9])(?=\S+$).{2,5000} ---Not working (?=.{3}\/\/.{0,100}[a-zA-Z0-9])(?=\S+$).{2,5000} --- Working – Randhir Singh Mar 22 '21 at 15:00
  • https://regex101.com/r/QFxgqo/1 – Randhir Singh Mar 22 '21 at 15:05
  • @RandhirSingh If you have additional requirements that are NOT already listed in the question, you need to edit the question and add them. Excluding a line that starts with "zzz" would be an example. You want everything necessary to answer the question inside the question itself so that readers don't have to read all the comments to get all the info necessary to write a good answer. – JeffC Mar 22 '21 at 15:12

1 Answers1

3

To get three chars before // and five chars after //, you can use

.{0,3}//.{0,5}
.{3}//.{5}

See the regex demo #1 and regex demo #2.

Mind that .{0,3}//.{0,5} is good to use when you expect matches that have fewer chars before // and after //, just because they are close to the start / end of string.

The .{3}//.{5} regex will not match in a ab//abcde string, for example, as it will require exactly three and five chars before/after //.

Depending on how you declare the regex, you might need to escape /.

More details:

  • .{0,3} - zero to three chars other than line break chars
  • .{3} - thre chars other than line break chars
  • // - a // string
  • .{5} - five chars other than line break chars
  • .{0,5} - zero to five chars other than line break chars

Now, answering your edit and comment, if you want to extract a .{3}//.{5} substring from a string that does not start with zzz and contains 2 to 5000 non-whitespace only chars you can use

^(?!zzz)(?=\S{2,5000}$).*(.{3}//.{0,100})(?!\w)
^(?!zzz)(?=\S{2,5000}$).*?(.{3}//.{0,100})(?!\w)

Grab Group 1. See the regex demo. Details:

  • ^ - start of string
  • (?!zzz) - no zzz allowed at the start of a string
  • (?=\S{2,5000}$) - the string must only consist of two to 5000 non-whitespace chars
  • .*? - match/consume any zero or more chars other than line break chars, as few as possible (.* consumes as many as possible)
  • (.{3}//.{0,100}) - any 3 chars other than line break chars, //, and any 0 to 100 chars other than line break chars
  • (?!\w) - not followed with a word char. Remove if this check is not required.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563