5

I am new with Regular Expression and might be my question is very basic one.

I want to create a regular expression that can search an expression on a particular line number.

eg. I have data

"\nerferf erferfre erferf 12545" + 
"\ndsf erf" + 
"\nsdsfd refrf refref" + 
"\nerferf erferfre erferf 12545" + 
"\ndsf erf" + 
"\nsdsfd refrf refref" + 
"\nerferf erferfre erferf 12545" + 
"\ndsf erf" + 
"\nsdsfd refrf refref" + 
"\nerferf erferfre erferf 12545" + 

And I want to search the number 1234 on 7th Line. It may or may not be present on other lines also.

I have tried with

"\\n.*\\n.*\\n.*\\n.*\\n.*\\n.*\\d{4}" 

but am not getting the result.

Please help me out with the regular expression.

Nitesh Gupta
  • 215
  • 1
  • 11

4 Answers4

5

Firstly, your newline character should be placed at the end of the lines. That way, picturing a particular line would be easier. Below explanation is based on this modification.

Now, to get to 7th line, you would first need to skip the first 6 line, that you can do with {n,m} quantifier. You don't need to write .*\n 6 times. So, that would be like this:

(.*\n){6}

And then you are at 7th line, where you can match your required digit. That part would be something like this:

.*?1234

And then match rest of the text, using .*

So, your final regex would look like:

(?s)(.*\n){6}.*?1234.*

So, just use String#matches(regex) method with this regex.

P.S. (?s) is used to enable single-line matching. Since dot(.) by default, does not matches the newline character.

To print something you matched, you can use capture groups:

(?s)(?:.*\n){6}.*?(1234).*

This will capture 1234 if matched in group 1. Although it seems unusual, that you capture an exact string that you are matching - like capturing 1234 is no sense here, as you know you are matching 1234, and not against \\d, in which case you might be interested in exactly what are those digits.

Rohit Jain
  • 209,639
  • 45
  • 409
  • 525
  • Thanks. For searching it is skipping first 6 lines but when it is printing, it is printing first six lines also. – Nitesh Gupta Apr 22 '13 at 11:08
  • @NiteshGupta. Just capture whatever you want to print in a group. What do you want to print anyways? – Rohit Jain Apr 22 '13 at 11:11
  • Can't we remove the unwanted lines from regular expressions only. Just a thought. I don't know much about regular expressions. – Nitesh Gupta Apr 22 '13 at 11:13
  • @NiteshGupta. Of course you can do, but again the question - Do you just want that 7th line back? If yes, then capture it in a group. That's it. I've edited my answer a little bit. – Rohit Jain Apr 22 '13 at 11:15
  • Do note that `.` in Java [excludes more than just `\n`](http://stackoverflow.com/questions/14648743/whats-the-difference-between-these-regex/14648811#14648811). The current regex will exclude some character from the whole string (if it matters). Another thing is that you don't need to search more after you have found `1234` – nhahtdh Apr 22 '13 at 11:50
2

Try

Pattern p = Pattern.compile("^(\\n.*){6}\\n.*\\d{4}" );
System.out.println(p.matcher(s).find());
Arun P Johny
  • 384,651
  • 66
  • 527
  • 531
1

This problem is better not solved with regex alone. Start by splitting the string on a newline character, to get an array of lines:

String[] lines = data.split("\\n");

Then, to execute the regex on line 7:

try {
    String line7 = lines[6];
    // do something with it
} catch (IndexOutOfBoundsException ex) {
    System.error.println("Line not found");
}

Hope this is a start for you.

Edit: I'm not a pro in Regex but I would try with this one:

"(\\n.*){5}(.*)"

Sorry if this isn't the correct Java syntax but this should capture 5 new lines + data first, so that's six lines gone, and the data itself should be available in the second capture group (including newline). If you want to exclude the newline in front:

"(\\n.*){5}\\n(.*)"
MarioDS
  • 12,895
  • 15
  • 65
  • 121
  • Thanks for your reply. But I am learning RegEx and want to solve this issue with that only. I can't split the string. – Nitesh Gupta Apr 22 '13 at 10:48
0

You can use:

(^.*\r\n)(^.*\r\n)(^.*\r\n)(^.*\r\n)(^.*\r\n)(^.*\r\n)(^.*)(1234)
Roney Michael
  • 3,964
  • 5
  • 30
  • 45