1

What I'm trying to do is making a valid mail id using regular expressions, from a given string. This is my code:

Pattern pat3 = Pattern.compile("[(a-z)+][(a-z\\d)]+{3,}\\@[(a-z)+]\\.[(a-z)+]");

Matcher mat3 = pat3.matcher("dasdsa@2 @ada. ss2@dad.2om p2@   2@2.2 fad2@yahoo.com 22@yahoo.com fad@yahoo.com");
System.out.println(mat3.pattern() + " ");

while(mat3.find()){
    System.out.println("Position: " + mat3.start() + " ");
}

The problem is nothing is printed out. What I want to print, and what I really expect to print, but it doesn't, is: 39, 67.
Can someone explain me, why \\. doesn't work? Before putting \\. my regex was working fine till that point.

Pshemo
  • 122,468
  • 25
  • 185
  • 269
Vlad Dumitrache
  • 109
  • 1
  • 1
  • 7
  • 1
    What do you think `[(a-z)+]` does? Also why you escape `@`? – Pshemo Dec 25 '13 at 02:04
  • Pshemo: [(a-z)+] makes a letter from a to z appear minimum from one to n times. I escape @ because I thought a mail id must have 3 letters minimum like s1s@something.com(or a string). – Vlad Dumitrache Dec 25 '13 at 02:13
  • @Vlad, no, `[(a-z)+]` doesn't quite do what you think it does. You are confusing `[]` and `()`. – Dawood ibn Kareem Dec 25 '13 at 02:16
  • 3
    @VladDumitrache No `[...]` is [character class](http://www.regular-expressions.info/charclass.html) and `[(a-z)+]` mean: accept one of `(` or `a-z` range or `)` or `+` sign. Try maybe `[a-z\\d]{3,}@[a-z]+\\.[a-z]+` but this will also accept only numbers in users name. If you want to guarantee first character to be letter you can change `[a-z\\d]{3,}` to `[a-z][a-z\\d]{2,}` – Pshemo Dec 25 '13 at 02:16
  • Aa, I've understood :D. I'll try to rewrite it and hope to success. If I won't, I'll ask for your help again, if u can help me :). – Vlad Dumitrache Dec 25 '13 at 02:21
  • @VladDumitrache Just from curiosity, why do you even validate e-mail with regex? Just send user activation link on his e-mail. If it is not valid then user will not be able to activate its account. That should be enough to tell if e-mail is correct or not. Take a look at [this article](http://davidcel.is/blog/2012/09/06/stop-validating-email-addresses-with-regex/). – Pshemo Dec 25 '13 at 02:23
  • I mean when he will register somewhere, let's say on a website, email is required. It's a form of anti-spam. This regex, force you to type a valid mail address. The result is: "[a-z]+[(a-z)\\d]+{2,}\\@[a-z]+\\.[a-z]". Thanks a lot guys :). – Vlad Dumitrache Dec 25 '13 at 02:31
  • OK, do as you wish. Anyway you probably want to change `+{2,}` into `{2,}` to avoid backtracking. You also don't need to escape `@` so remove \\ before it. Last thing I suspect to be wrong is lack of `+` at the end of regex (currently you accept only one character after dot). – Pshemo Dec 25 '13 at 02:39
  • Did you try googling for regular expressions that match valid email addresses? You are not the first person to want to solve this particular problem. – Dawood ibn Kareem Dec 25 '13 at 02:40
  • I googled it, but what I googled was about regular expressions. This was for my practice, to understand better regular expressions. Indeed at the end of the regex I forgot +. You said to change +{2,} into {2,} to avoid backtracking. Can you give me an example, please? Greets. – Vlad Dumitrache Dec 25 '13 at 12:38

1 Answers1

1

Make your pattern as the following :

[a-z]+[a-z\\d]+{3,}\\@[a-z]+\\.[a-z]+

So, the code will be :

Pattern pat3 = Pattern.compile("[a-z]+[a-z\\d]+{3,}\\@[a-z]+\\.[a-z]+");

// Your Code

while(mat3.find()){
    System.out.println("Position: " + mat3.start() + " ---  Match: " + mat3.group());
}

This will give the following result :

Pattern :: [a-z]+[a-z\d]+{3,}\@[a-z]+\.[a-z]+
Position: 39 ---  Match: fad2@yahoo.com
Position: 67 ---  Match: fad@yahoo.com

Explanation:

You have put the pattern as

[(a-z)+][(a-z\\d)]+{3,}\\@[(a-z)+]\\.[(a-z)+]

the character set, [(a-z)+] will not match one or more repetition of lower-case alphabet. It will match only one occurrence of any of these : (, a-z, ), +

to match one or more repetition of lower-case alphabets, the character set should be like [a-z]+

So if you remove the \\. part from your pattern , and

while(mat3.find()){
    System.out.println("Position: " + mat3.start() + " ---  Match: " + mat3.group());
}

will give :

Pattern :: [(a-z)+][(a-z\d)]+{3,}\@[(a-z)+][(a-z)+]
Position: 15 ---  Match: ss2@da     // not ss2@dad
Position: 39 ---  Match: fad2@ya    // not fad2@yahoo
Position: 67 ---  Match: fad@ya     // not fad@yahoo
Denim Datta
  • 3,740
  • 3
  • 27
  • 53
  • I've succeed to resolve it from Pshemo's answer. I was thinking about [a-z]+[a-z\\d]+{2,}\\@[a-z]+\\.[a-z]+"); But I vote for your answer being such complex and at the point. Thanks a lot and have a happy Christmas! – Vlad Dumitrache Dec 25 '13 at 12:32