2

I need to protect the email addresses contained in a text. Ideally find a regular expression that could do it more effectively.

Example:

Hi:

My Name is Alex and my mail is alexmail@domain.com but you can reply to
alexreply@other.domain.com.

Desired output:

Hi:

My Name is Alex and my mail is ale****@domain.com but you can reply to
ale****@other.domain.com.

The logic is: keep first 3 characters and replace the rest with * until the @.

a@mail.com     => a****@mail.com
ab@mail.com    => ab****@mail.com
abc@mail.com   => abc****@mail.com
abcd@mail.com  => abc****@mail.com
abcde@mail.com => abc****@mail.com

Now, I made a function to protect a mail in this way, but when it is a text containing several emails then I can not use replaceAll.

public static String protectEmailAddress(String emailAddress) {
     String[] split = emailAddress.split("@");
     if (split[0].length() >= 3) {
         split[0] = split[0].substring(0, 3);  
     }
     emailAddress = StringUtils.join(split, "****@");

     return emailAddress;
}

So basically what I need is a nice regex that work. Something similar to this but with another section of the mail, if possible.

Thanks...

Community
  • 1
  • 1

5 Answers5

7

You can use (\\w{1,3})(\\w+)(@.*)

String str = "alexreply@other.domain.com";
str = str.replaceAll("(\\w{1,3})(\\w+)(@.*)", "$1****$3");
System.out.println(str);

OUTPUT

ale****@other.domain.com

Explanation :

  • (\\w{1,3}) : matches 1 to 3 word characters
  • (\\w+) : matches one or more word characters
  • (@.*) : matches anything after(inclusive) @
  • $1 : means group one which is (\\w{1,3})
  • $3 : means group three which is (@.*)
Eduardo Sanchez-Ros
  • 1,777
  • 2
  • 18
  • 30
akash
  • 22,664
  • 11
  • 59
  • 87
  • This work but just like @arshajii said, it's prone to false positive. But if your are sure that the string it's already a email address, this work fine to replace but only for one address. – Alejandro Gomez Dec 23 '15 at 17:39
  • 1
    I made small change in regex \\w{1,3})(\\w+.*)(@.*) to handle longer addresses with dots in first part. Thanks for this post. – Michal Cholewiński Feb 28 '18 at 10:21
2

You could probably use something like:

text = text.replaceAll("\\S{1,4}@","****@");

It should replace 1 to 4 ({1,4}) non-whitespace characters (\\S) which are followed by @ with ****@.

So it will replace text in a way

a@          -> ****@
ab@         -> ****@
abc@        -> ****@
abcd@       -> ****@
abcde@      -> a****@
abcdef@     -> ab****@
Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • My first approach was this regular expression but needed to show the first three characters and replace the rest with * until @. So I opted for the function. – Alejandro Gomez Dec 23 '15 at 17:06
  • Oh, so logic is "leave 3 characters and hide rest" not "hide last 4 characters". Will try to update it. – Pshemo Dec 23 '15 at 17:07
  • The downside of this approach is that it's prone to false positives if I have `@`s in my text that aren't part of an email address. It's typically not a great idea to parse email addresses using a regex in the first place. – arshajii Dec 23 '15 at 17:13
  • @arshajii Yes, parsing emails is not trivial task so there are many traps which will make our code to return many false positive solutions. Best approach would be probably to use library which will find all emails and then change them manually (even with little help of regex). – Pshemo Dec 23 '15 at 17:17
  • Do you have any library in mind? I regularly use Apache Commons but I found nothing there... – Alejandro Gomez Dec 23 '15 at 17:21
  • @AlejandroGómez Unfortunately no, Sorry. – Pshemo Dec 23 '15 at 17:24
  • @AlejandroGómez I came up with this solution `replaceAll("(?<=\\S{3})\\S(?=\\S*@)", "*")` but since it requires many additional traverses back and forward for each checked character I would not use it if efficiency is important (also `\\S` can match any non-whitespace character so it can give some false-positive results). Your code looks like right approach. – Pshemo Dec 23 '15 at 17:28
1

I suggest following approach:

public static void main(String[] args) {
        String text = "Hi:"
                + " "
                + "My Name is Alex and my mail is alexmail@domain.com but you can reply to "
                + "alexreply@other.domain.com."
                + " a@mail.com"
                + " abcd@mail.com";

        String emailPattern = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*"
                + "      @[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";

        emailPattern = "(?<emailHead>[_A-Za-z0-9-\\+]{1,3})+?(?<replacementEmailPart>[_A-Za-z0-9-\\+]*)*?(?<emailTail>@[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})){1}";

        Pattern p = Pattern.compile(emailPattern);

        Matcher m = p.matcher(text);

        StringBuffer sb = new StringBuffer();
        while (m.find()) {
            String replStr = m.group("replacementEmailPart");
            if (replStr != null) {
                replStr = replStr.replaceAll("[_A-Za-z0-9-\\+]", "*");
            } else {
                replStr = "****";
            }
            m.appendReplacement(sb, m.group("emailHead")
                    + replStr
                    + m.group("emailTail"));
        }
        m.appendTail(sb);
        System.out.println(sb.toString());
    }
Mikhailov Valentin
  • 1,092
  • 3
  • 16
  • 23
0

This is method that checks string for validity( is it email or not). Split your text into words, and check each word with this method. If it is email replace it with stars(*).

public static boolean isValidEmail(String str) {
    String pattern = "^[\\w!#$%&’*+/=?`{|}~^-]+(?:\\.[\\w!#$%&’*+/=?`{|}~^-]+)*@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,6}$";
    Pattern p = Pattern.compile(pattern);
    Matcher m = p.matcher(str);
    return m.matches();
}
Pshemo
  • 122,468
  • 25
  • 185
  • 269
Azat Nugusbayev
  • 1,391
  • 11
  • 19
0

Regular expressions are not the right tool for this (see: Using a regular expression to validate an email address). Another approach would be to do something along these lines:

  1. Split your message into words (message.split("\\s+") or something to that effect).

  2. For each word, check if it is an e-mail address via the InternetAddress constructor:

    try {
        new InternetAddress(word, true);
        // valid e-mail address
    } catch (AddressException e) {
        // not an e-mail address
    }
    
  3. If a word is an e-mail address, "protect" it using your current function.

  4. Rejoin all of the words into a new message wherein the e-mail addresses are all protected.


If you really want to use regular expressions on the other hand, then... well... you asked for it.

Community
  • 1
  • 1
arshajii
  • 127,459
  • 24
  • 238
  • 287
  • I think I'll have to go with this solution. I expected to find a magical regular expression to resolve everything for me :D. But I'm worry about the performance using this approach. – Alejandro Gomez Dec 23 '15 at 17:44