.NET RegEx - First N chars of First M lines

Question

I want 4 general RegEx expressions for the following 4 basic cases:

Up to A chars starting after B chars from start of line on up to C lines starting after D lines from start of file
Up to A chars starting after B chars from start of line on up to C lines occurring before D lines from end of file
Up to A chars starting before B chars from end of line on up to C lines starting after D lines from start of file
Up to A chars starting before B chars from end of line on up to C lines starting before D lines from end of file

These would allow to select arbitrary text blocks anywhere in the file.

So far I have managed to come up with cases that only work for lines and chars separately:

(?<=(?m:^[^\r]{N}))[^\r]{1,M} = UP TO M chars OF EVERY LINE, AFTER FIRST N chars
[^\r]{1,M}(?=(?m:.{N}\r$)) = UP TO M chars OF EVERY LINE, BEFORE LAST N chars

The above 2 expressions are for chars, and they return MANY matches (one for each line).

(?<=(\A([^\r]*\r\n){N}))(?m:\n*[^\r]*\r$){1,M} = UP TO M lines AFTER FIRST N lines
(((?=\r?)\n[^\r]*\r)|((?=\r?)\n[^\r]+\r?)){1,M}(?=((\n[^\r]*\r)|(\n[^\r]+\r?)){N}\Z) = UP TO M lines BEFORE LAST N lines from end

These 2 expressions are equivalents for the lines, but they always return just ONE match.

The task is to combine these expressions to allow for scenarios 1-4. Anyone can help?

Note that the case in the title of the question, is just a subclass of scenario #1, where both B = 0 and D = 0.

EXAMPLE 1: Characters 3-6 of lines 3-5. A total of 3 matches.

SOURCE:

line1 blah 1
line2 blah 2
line3 blah 3
line4 blah 4
line5 blah 5
line6 blah 6

RESULT:

<match>ne3 </match>
<match>ne4 </match>
<match>ne5 </match>

EXAMPLE 2: Last 4 characters of 2 lines before 1 last line. A total of 2 matches.

SOURCE:

line1 blah 1
line2 blah 2
line3 blah 3
line4 blah 4
line5 blah 5
line6 blah 6

RESULT:

<match>ah 4</match>
<match>ah 5</match>

This looks like a task which would be solved really easily *without* regular expresions (and particularly with LINQ)... why are you trying to use a regex here? — Jon Skeet, Dec 29 '10 at 09:03
First of all I am restricted to .NET 2.0 (so LINQ is not an option, unfortunately), and Secondly I need the flexibility of RegEx to allow for more sophisticated expressions that will build on these (i.e. not selecting every character, but only some specific ones, etc.) — Fit Dev, Dec 29 '10 at 09:09
Those regexes re incomprehensible. You would be much better off making functions to do this instead of regex, IMO. — , Dec 29 '10 at 09:52
I can't understand most of your post... Can you post a few examples, or at least one? Why not splitting the lines and picking substrings? — Kobi, Dec 29 '10 at 10:15
Thanks for comments, guys. Unfortunately due to nature of the project it has to be done in RegEx. Sorry that the expressions are hard to understand, but I am afraid that's the way RegEx is... As for the examples, I have added one at the bottom. — Fit Dev, Dec 29 '10 at 14:47
Why has this got to be solved exclusively with RegEx? The current requirements (i.e. line and character positions) make it a god awful choice. — Tim Lloyd, Dec 29 '10 at 16:34
Too many reasons, including compatibility, consistency, Interface requirements, UI, etc. — Fit Dev, Dec 29 '10 at 18:13
To those interested, the fruits of this work went into making RegEx helper tools for Batch RegEx software: http://www.binarymark.com/Products/BatchRegEx/default.aspx — Fit Dev, Jan 25 '13 at 15:08

Tim Pietzcker · Answer 1 · 2010-12-30T14:18:38.003

2

Here's one regex for basic case 2:

Regex regexObj = new Regex(
    @"(?<=              # Assert that the following can be matched before the current position
     ^                # Start of line
     .{2}             # 2 characters (B = 2)
    )                 # End of lookbehind assertion
    .{1,3}            # Match 1-3 characters (A = 3)
    (?=               # Assert that the following can be matched after the current position
     .*$              # rest of the current line
     (?:\r\n.*){2,4}  # 2 to 4 entire lines (D = 2, C = 4+1-2)
     \z               # end of the string
    )", 
    RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);

In the text

line1 blah 1
line2 blah 2
line3 blah 3
line4 blah 4
line5 blah 5
line6 blah 6

it will match

ne2
ne3
ne4

(ne2 starts at the third character (B=2) in the fifth-to-last line (C+D = 5), etc.)

edited Dec 30 '10 at 14:18

answered Dec 29 '10 at 21:07

Tim Pietzcker

328,213
58
503
561

For some reason, this one does not seem to work at all :( Tried playing around with no success. Also, not 100% sure about using "\r\n.*" here to account for any line in general - because since we are counting from the end of file, if we have reached the first line, there would be no \r and \n before the contents of the line, since it's the first one. – Fit Dev Dec 30 '10 at 09:31
Works here (in RegexBuddy), that is if I understood your requirements correctly. I added an example of what it does. No worries about the leading `\r\n`; when the range increases (say `{2,8}`) it doesn't hurt if it hits the first line. – Tim Pietzcker Dec 30 '10 at 14:21
Hmmm for some reason this does not work at all for me. I tried several .NET-based tools like Expresso, and in all cases there were no matches. Though just by looking at this RegEx I can't see what may be wrong... Apparently there must be something in the .NET flavor that behaves differently. And I absolutely made sure that the correct RegEx options were turned on. – Fit Dev Dec 30 '10 at 15:32
I think I might have found the one that works for case 2: "(?m:(?<=^.{2})(?=[^\r]*\r?(?:\r\n.*){2,4}\z))[^\r]{1,4}". 1) I changed "." to "[^\r]" and added another "\r" where it was needed as a result of the change. 2) I changed "$" to "\r?" - there was some problem counting lines with $. 3) And most importantly I switched places of ".{1,3}" and "(?=...)" group - that was the main culprit. I guess .NET wants all lookaheads and lookbehinds appear BEFORE any captures. – Fit Dev Dec 30 '10 at 15:53

Dan Tao · Answer 2 · 2010-12-29T19:06:39.770

Edit: Based on your comments, it sounds like this really is something out of your control. The reason I posted this answer is that I feel like often, especially when it comes to regular expressions, developers get easily caught up in the technical challenge and lose sight of the actual goal: solving the problem. I know I'm this way too. I think it's just an unfortunate consequence of being both technically and creatively minded.

So I wanted to refocus you, if possible, on the problem at hand, and stress that, in the presence of a well-stocked toolset, Regex is not the right tool for this job. If it's the only tool at your disposal for reasons outside your control, then, of course, you have no choice.

I figured you probably had real reasons for demanding a Regex solution; but since those reasons weren't fully explained, I felt there was still a chance you were just being stubborn ;)

You say this needs to be done in Regex, but I'm not convinced!

First of all I am restricted to .NET 2.0 [ . . . ]

No problem. Who says you need LINQ for a problem like this? LINQ just makes things easier; it doesn't make impossible things possible.

Here's one way you could implement the first case from your question, for example (and it would be fairly straightforward to refactor this into something more flexible, allowing you to cover cases 2–3 as well):

public IEnumerable<string> ScanText(TextReader reader,
                                    int start,
                                    int count,
                                    int lineStart,
                                    int lineCount)
{
    int i = 0;
    while (i < lineStart && reader.Peek() != -1)
    {
        reader.ReadLine();
        ++i;
    }

    i = 0;
    while (i < lineCount && reader.Peek() != -1)
    {
        string line = reader.ReadLine();

        if (line.Length < start)
        {
            yield return ""; // or null? or continue?
        }
        else
        {
            int length = Math.Min(count, line.Length - start);
            yield return line.Substring(start, length);
        }

        ++i;
    }
}

So there's a .NET 2.0-friendly solution to the general problem, without using regular expressions (or LINQ).

Secondly I need the flexibility of RegEx to allow for more sophisticated expressions that will build on these [ . . . ]

Maybe I'm just being dense; what's preventing you from starting with something non-Regex, and then using Regex for more "sophisticated" behavior on top of that? If you need to do additional processing on the lines returned by ScanText above, for instance, you can certainly do so using Regex. But to insist on using Regex from the start seems... I don't know, just unnecessary.

Unfortunately due to nature of the project it has to be done in RegEx [ . . . ]

If that's truly the case, then very well. But if your reasons are only those from the excerpts above, then I disagree that this particular aspect of the problem (scanning certain characters from certain lines of text) needs to be addressed using Regex, even if Regex will be required for other aspects of the problem not covered in the scope of this question.

If, on the other hand, you're being forced to use Regex for some arbitrary reason—say, someone chose to write in some requirement/spec, possibly without putting much thought into it, that regular expressions would be used for this task—well, I would personally advise fighting against it. Explain to whoever is in a position to change this requirement that Regex is not necessary and that the problem can easily be solved without using Regex... or using a combination of "normal" code and Regex.

The only other possibility I can think of (though this may be the result of my own lack of imagination) that would explain you needing to use Regex for the problem you've described in your question is that you're restricted to using a particular tool that exclusively accepts regular expressions as user input. But your question is tagged .net, and so I have to assume there is some degree to which you can write your own code to be used in solving this problem. And if that's the case, then I will say it again: I don't think you need Regex ;)

+1. I wrote Regex Hero to help people with regular expressions and even I have to agree with you. For the most part, regular expressions are a tool there to save you time from parsing text the hard way. But when you're spending more time bending regular expressions to your will to accomplish something that could be done more easily and efficiently with procedural code, then that kind of defeats the purpose. — Steve Wortham, Dec 29 '10 at 17:24
Thanks, Dan, for the extensove answer. That is certainly a nice solution, and I am aware of several ways to do it procedurally, but for backwards-compatibility and compatibility with other applications, it needed to be RegEx. The only reason I included .net was to specify the precise RegEx flavor, as they may differ. — Fit Dev, Dec 29 '10 at 18:18
@George: You might want to show your appreciation by upvoting helpful answers, especially when asking such a complicated question and receiving such a detailed answer. — Tim Pietzcker, Dec 29 '10 at 18:26
@Tim I would certainly love to, but apparently only those registered can do that. — Fit Dev, Dec 29 '10 at 18:36
@George: OK, I hadn't noticed that you're not registered yet. I suggest you do register, then. It's worth it. For all of us :) — Tim Pietzcker, Dec 29 '10 at 18:41
@George: Not a big deal—I posted this not for upvotes but to steer you away from what I thought *might* have been a stubborn insistence on using the wrong tool for the technical challenge alone. See my edit for a fuller explanation. Unfortunately, I'm no Regex guru; but I wish you the best of luck with this bothersome problem. — Dan Tao, Dec 29 '10 at 19:07
@Dan I totally agree with you on this, but unfortunately it's out of my control. But your post will definitely benefit people! — Fit Dev, Dec 29 '10 at 19:31

Tim Pietzcker · Accepted Answer · 2010-12-30T14:13:41.297

1

For starters, here's an answer for "Basic Case 1":

Regex regexObj = new Regex(
    @"(?<=            # Assert that the following can be matched before the current position
     \A               # Start of string
     (?:.*\r\n){2,4}  # 2 to 4 entire lines (D = 2, C = 4+1-2)
     .{2}             # 2 characters (B = 2)
    )                 # End of lookbehind assertion
    .{1,3}            # Match 1-3 characters (A = 3)", 
    RegexOptions.IgnorePatternWhitespace);

You can now iterate over the matches using

Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
    // matched text: matchResults.Value
    // match start: matchResults.Index
    // match length: matchResults.Length
    matchResults = matchResults.NextMatch();
}

So, in the text

line1 blah 1
line2 blah 2
line3 blah 3
line4 blah 4
line5 blah 5
line6 blah 6

it will match

ne3
ne4
ne5

edited Dec 30 '10 at 14:13

answered Dec 29 '10 at 17:50

Tim Pietzcker

328,213
58
503
561

@Tim YES! Thank you! That works excellent for case 1! Exactly what I need! Would be excellent if you could do similar thing for case 2 (since the end of file is a different story and expressions involving it are a bit harder...)! And just a small side-question: why to get 3 lines after first 2 lines, you end up using {2,4} instead of {3,5}, are the line indecies 0-based? – Fit Dev Dec 29 '10 at 18:31
For the side question: The assertion checks that there are 2, 3 or 4 lines *before* the current line, so this makes sure that the current line is line number 3, 4 or 5. To get five lines starting with line number 6, you'd therefore need `{5,9}`. – Tim Pietzcker Dec 29 '10 at 18:36
As for writing the other regexes: No problem, if you upvote the answers that help you. – Tim Pietzcker Dec 29 '10 at 18:36
@Tim You convinced me to register, since so many features are disabled for non-registered users. Though frankly - too bad this site does not offer their own user system... no too keen to share the same ID on multiple sites. I certainly upvoted your answer! :) – Fit Dev Dec 29 '10 at 18:54
It seems I have problems with case 3 as well - cannot figure out how to rewrite your RegEx to make characters counted from end of line, even though that apart from this condition, it's exactly like case 1. – Fit Dev Dec 29 '10 at 18:56
@George: For some insight into why SO leverages OpenID for user authentication, see [Jeff Atwood's blog entry: Your Internet Driver's License](http://www.codinghorror.com/blog/2010/11/your-internet-drivers-license.html). Personally I think the reasoning is actually quite sound. – Dan Tao Dec 29 '10 at 19:49

Tim Pietzcker · Answer 4 · 2010-12-30T14:29:13.593

1

Here's one for basic case 3:

Regex regexObj = new Regex(
    @"(?<=            # Assert that the following can be matched before the current position
     \A               # Start of string
     (?:.*\r\n){2,4}  # 2 to 4 entire lines (D = 2, C = 4+1-2)
     .*               # any number of characters
    )                 # End of lookbehind assertion
    (?=               # Assert that the following can be matched after the current position
     .{8}             # 8 characters (B = 8)
     $                # end of line
    )                 # End of lookahead assertion
    .{1,3}            # Match 1-3 characters (A = 3)", 
    RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);

So in the text

line1 blah 1
line2 blah 2
line3 blah 3
line4 blah 4
line5 blah 5
line6 blah 6

it will match

3 b
4 b
5 b

(3 b because it's 3 characters (A = 3), starting at the 8th-to-last character (B = 8), starting in the third line (D = 2), etc.)

edited Dec 30 '10 at 14:29

answered Dec 29 '10 at 21:12

Tim Pietzcker

328,213
58
503
561

Thank you, Tim! This one kind of works, except it does not account for B in case 3 - in other words, it behaves as if B=0. In addition, I had to modify it a bit to "(?m:(?<=\A(?:.*\r\n){2,5}.*)(?=[^\r]{5}\r?$).{1,3})" (changed ".{5}" to "[^\r]{5}\r?", since I don;t want \r to be treated as a character in a line. Unfortunately I don;t know how to solve the main problem with accounting for B... – Fit Dev Dec 30 '10 at 09:22
This works very well, but has the same problem as your edit for case 4 - if the number of characters in a line is less than A+B, no match is returned. But it should be the case only if the number of characters in a line is <=B (not <=(A+B)). – Fit Dev Dec 30 '10 at 15:24
After some more experimentation I still cannot get this regex to work. After you plug-in the values, though it looks like "(?m:(?<=\A(?:.*\r\n){D,C+D-1}.*)(?=[^\r]{A+B}\r?$).{1,A})", however as I described above no match is returned as soon as A+B exceeds the number of characters. This is quite important, because as a result it is not possible to select all characters until the beginning of a line, which would be very useful. – Fit Dev Dec 30 '10 at 19:29
At last I was able to find the correct RegEx for case 3: "(?m:.{1,A}(?<=.*(?=[^\r]{B}\r?$)(?<=\A(?:.*\r\n){D,C+D-1}[^\r]*)))" So there is just case 4 left... Without you, Tim, I wouldn't be able to do any of that! Thanks a lot! I just hope you can help me out with case 4. – Fit Dev Dec 30 '10 at 20:29

Tim Pietzcker · Answer 5 · 2010-12-30T14:26:31.703

1

And finally one solution for basic case 4:

Regex regexObj = new Regex(
    @"(?=             # Assert that the following can be matched after the current position
     .{8}             # 8 characters (B = 8)
     (?:\r\n.*){2,4}  # 2 to 4 entire lines (D = 2, C = 4+1-2)
     \z               # end of the string
    )                 # End of lookahead assertion
    .{1,3}            # Match three characters (A = 3)", 
    RegexOptions.IgnorePatternWhitespace);

In the text

line1 blah 1
line2 blah 2
line3 blah 3
line4 blah 4
line5 blah 5
line6 blah 6

this will match

2 b
3 b
4 b

(2 b because it's three characters (A = 3), starting at the 8th-to-last character (B = 8) in the fifth-to-last line (C+D = 5), etc.)

edited Dec 30 '10 at 14:26

answered Dec 29 '10 at 21:17

Tim Pietzcker

328,213
58
503
561

This one works very well for selecting the lines (I'll see if I can adapt this for case 2). But there are some funny things with character selection here. For a start the number of characters in the match seems to be determined not by "{1,3}", which it should be, but rather by "{5}", which should be the number of characters to skip from line's end. – Fit Dev Dec 30 '10 at 09:44
I must thank you, Tim, for your great contributions. These RegEx shed a lot of light on how the task of character selection should be approached in RegEx in general! – Fit Dev Dec 30 '10 at 09:49
This new edit almost solved the problem! Except for one case: if the line has less than A+B characters, then the number of characters in the match should be A>number_of_chars_in_match>0. The way it currently works is that there is simply no match, if the number of characters is less than A+B. I tried changing {8} to something like {6,8} but that does not seem to work either - it produces more matches than there should be... – Fit Dev Dec 30 '10 at 15:17
Actually, thanks to insights gained from your original expressions, Tim, I was able to figure out one for case 4 as well: "(?m:.{1,A}(?=.{B}(?:\r\n[^\r]*){D,C+D-1}\z))". So, thank you for your help, Tim! You really helped settle this! – Fit Dev Dec 30 '10 at 21:41
@George: That's great! Sorry I couldn't answer quicker as I was away from a computer over most of the last days. I'm glad that you could figure it out yourself. – Tim Pietzcker Dec 31 '10 at 08:41

score 0 · Answer 6 · edited Dec 29 '10 at 17:06

0

Why don't you just do something like this:

//Assuming you have it read into a string name sourceString
String[] SplitString = sourceString.Split(Environment.Newline); //You will probably need to account for any line delimeter
String[M] NewStrings;
for(i=0;i<M;i++) {
    NewStrings[i] = SplitString[i].SubString(0,N) //Or (N, SplitString[i].Length -1) depending on what you need
}

You don't need RegEx, you don't need LINQ.

Well I reread the start of your question and you could simply parameterize the start and end of the for loop and the Split to get exactly what you need.

edited Dec 29 '10 at 17:06

Robert P

15,707
10
68
112

answered Dec 29 '10 at 15:22

msarchet

15,104
2
43
66

That would certainly work, except it's not RegEx, and as I explained I need RegEx. In fact I am not 100% sure if it's possible to do it with RegEx, but on the other hand it would seem to me that such positional selection of characters should be pretty trivial. – Fit Dev Dec 29 '10 at 15:33
@George why do you need RegEx, your over obfuscating your intentions when someone has to read through all of that. – msarchet Dec 29 '10 at 15:46
Mainly - backwards compatibility and compatibility with other apps that only accept RegEx as input. – Fit Dev Dec 29 '10 at 18:20

score 0 · Answer 7 · answered Dec 30 '10 at 01:34

Excuse me for two points:

I propose solutions that aren’t entirely Regex based. I know, I read that you need pure Regex solutions. But I went into the interesting problem and I rapidly concluded that usage of regexes for this problem is overcomplicating it. I didn’t fell able to answer with pure Regex solutions. I found the following ones, and I show them; maybe, they could give you ideas.
I dont know C# or .NET, only Python. As regexes are nearly the same in all languages, I thought I was going to answer with just regexes, that’s why I began to search about the problem. Now, I show my solutions in Python all the same because I think that anyway it’s easy to understand.

I think it’s very difficult to capture all the occurences of letters that you want in a text by means of a unique regex, because finding several letters in several lines seems to me a problem of finding nested matches in matches (maybe am I not enough skilled in regexes).

So I thought better to search primarily all the occurences of letters in all lines and to put them in a list, and next to select the whished occurences by slicing in the list.

For the search of letters in a line, a regex seemed OK to me first. SO the solution with function selectRE().

Afterwarrds, I realized that selecting the letters in a line is the same as slicing a line at convenient indexes and that’s the same as slicng a list. Hence the function select().

I give the two solutions together, so the equality of the two results of the two functions can be verified.

import re

def selectRE(a,which_chars,b,x,which_lines,y,ch):
    ch = ch[:-1] if ch[1]=='\n' else ch # to obtain an exact number of lines
    NL = ch.count('\n') +1 # number of lines

    def pat(a,which_chars,b):
        if which_chars=='to':
            print repr(('.{'+str(a-1)+'}' if a else '') + '(.{'+str(b-a+1)+'}).*(?:\n|$)')
            return re.compile(('.{'+str(a-1)+'}' if a else '') + '(.{'+str(b-a+1)+'}).*(?:\n|$)')
        elif which_chars=='before':
            print repr('.*(.{'+str(a)+'})'+('.{'+str(b)+'}' if b else '')+'(?:\n|$)')
            return re.compile('.*(.{'+str(a)+'})'+('.{'+str(b)+'}' if b else '')+'(?:\n|$)')
        elif which_chars=='after':
            print repr(('.{'+str(b)+'}' if b else '')+'(.{'+str(a)+'}).*(?:\n|$)')
            return re.compile(('.{'+str(b)+'}' if b else '')+'(.{'+str(a)+'}).*(?:\n|$)')

    if   which_lines=='to'    :  x   = x-1
    elif which_lines=='before':  x,y = NL-x-y,NL-y
    elif which_lines=='after' :  x,y = y,y+x

    return pat(a,which_chars,b).findall(ch)[x:y]


def select(a,which_chars,b,x,which_lines,y,ch):
    ch = ch[:-1] if ch[1]=='\n' else ch # to obtain an exact number of lines
    NL = ch.count('\n') +1 # number of lines

    if   which_chars=='to'    :  a   = a-1
    elif which_chars=='after' :  a,b = b,a+b

    if   which_lines=='to'    :  x   = x-1
    elif which_lines=='before':  x,y = NL-x-y,NL-y
    elif which_lines=='after' :  x,y = y,y+x

    return [ line[len(line)-a-b:len(line)-b] if which_chars=='before' else line[a:b]
             for i,line in enumerate(ch.splitlines()) if x<=i<y ]


ch = '''line1 blah 1
line2 blah 2
line3 blah 3
line4 blah 4
line5 blah 5
line6 blah 6
'''
print ch,'\n'

print 'Characters 3-6 of lines 3-5. A total of 3 matches.'
print selectRE(3,'to',6,3,'to',5,ch)
print   select(3,'to',6,3,'to',5,ch)
print
print 'Characters 1-5 of lines 4-5. A total of 2 matches.'
print selectRE(1,'to',5,4,'to',5,ch)
print   select(1,'to',5,4,'to',5,ch)
print
print '7 characters before the last 3 chars of lines 2-6. A total of 5 matches.'
print selectRE(7,'before',3,2,'to',6,ch)
print   select(7,'before',3,2,'to',6,ch)
print
print '6 characters before the 2 last characters of 3 lines before the 3 last lines.'
print selectRE(6,'before',2,3,'before',3,ch)
print   select(6,'before',2,3,'before',3,ch)
print 
print '4 last characters of 2 lines before 1 last line. A total of 2 matches.'
print selectRE(4,'before',0,2,'before',1,ch)
print   select(4,'before',0,2,'before',1,ch)
print
print 'last 1 character of 4 last lines. A total of 2 matches.'
print selectRE(1,'before',0,4,'before',0,ch)
print   select(1,'before',0,4,'before',0,ch)
print
print '7 characters before the last 3 chars of 3 lines after the 2 first lines. A total of 5 matches.'
print selectRE(7,'before',3,3,'after',2,ch)
print   select(7,'before',3,3,'after',2,ch)
print
print '5 characters before the 3 last chars of the 5 first lines'
print selectRE(5,'before',3,5,'after',0,ch)
print   select(5,'before',3,5,'after',0,ch)
print
print 'Characters 3-6 of the 4 first lines'
print selectRE(3,'to',6,4,'after',0,ch)
print   select(3,'to',6,4,'after',0,ch)
print
print '9 characters after the 2 first chars of the 3 lines after the 1 first line'
print selectRE(9,'after',2,3,'after',1,ch)
print   select(9,'after',2,3,'after',1,ch)

result

line1 blah 1
line2 blah 2
line3 blah 3
line4 blah 4
line5 blah 5
line6 blah 6


Characters 3-6 of lines 3-5. A total of 3 matches.
'.{2}(.{4}).*(?:\n|$)'
['ne3 ', 'ne4 ', 'ne5 ']
['ne3 ', 'ne4 ', 'ne5 ']

Characters 1-5 of lines 4-5. A total of 2 matches.
'.{0}(.{5}).*(?:\n|$)'
['line4', 'line5']
['line4', 'line5']

7 characters before the last 3 chars of lines 2-6. A total of 5 matches.
'.*(.{7}).{3}(?:\n|$)'
['ne2 bla', 'ne3 bla', 'ne4 bla', 'ne5 bla', 'ne6 bla']
['ne2 bla', 'ne3 bla', 'ne4 bla', 'ne5 bla', 'ne6 bla']

6 characters before the 2 last characters of 3 lines before the 3 last lines.
'.*(.{6}).{2}(?:\n|$)'
['2 blah', '3 blah', '4 blah']
['2 blah', '3 blah', '4 blah']

4 last characters of 2 lines before 1 last line. A total of 2 matches.
'.*(.{4})(?:\n|$)'
['ah 5', 'ah 6']
['ah 5', 'ah 6']

last 1 character of 4 last lines. A total of 2 matches.
'.*(.{1})(?:\n|$)'
['4', '5', '6']
['4', '5', '6']

7 characters before the last 3 chars of 3 lines after the 2 first lines. A total of 5 matches.
'.*(.{7}).{3}(?:\n|$)'
['ne3 bla', 'ne4 bla', 'ne5 bla']
['ne3 bla', 'ne4 bla', 'ne5 bla']

5 characters before the 3 last chars of the 5 first lines
'.*(.{5}).{3}(?:\n|$)'
['1 bla', '2 bla', '3 bla', '4 bla', '5 bla']
['1 bla', '2 bla', '3 bla', '4 bla', '5 bla']

Characters 3-6 of the 4 first lines
'.{2}(.{4}).*(?:\n|$)'
['ne1 ', 'ne2 ', 'ne3 ', 'ne4 ']
['ne1 ', 'ne2 ', 'ne3 ', 'ne4 ']

9 characters after the 2 first chars of the 3 lines after the 1 first line
'.{2}(.{9}).*(?:\n|$)'
['ne2 blah ', 'ne3 blah ', 'ne4 blah ']
['ne2 blah ', 'ne3 blah ', 'ne4 blah ']

And now I will study the tricky solutions of Tim Pietzcker

.NET RegEx - First N chars of First M lines

7 Answers7