21

I am editing some email that got from tesseract ocr.

Here is my code:

 if (email != null) {
        email = email.replaceAll(" ", "");
        email = email.replaceAll("caneer", "career");
        email = email.replaceAll("canaer", "career");
        email = email.replaceAll("canear", "career");
        email = email.replaceAll("caraer", "career");
        email = email.replaceAll("carear", "career");
        email = email.replace("|", "l");
        email = email.replaceAll("}", "j");
        email = email.replaceAll("j3b", "job");
        email = email.replaceAll("gmaii.com", "gmail.com");
        email = email.replaceAll("hotmaii.com", "hotmail.com");
        email = email.replaceAll(".c0m", ".com");
        email = email.replaceAll(".coin", ".com");
        email = email.replaceAll("consuit", "consult");
    }
    return email;

But the output is not correct.

Input :

amrut=ac.hrworks@g mai|.com

Output :

lalcl.lhlrlwlolrlklsl@lglmlalil|l.lclolml

But when I assigned the result to a new String after every replacement, it works fine. Why continuous assignment in the same String is not working?

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
Neeraj
  • 1,612
  • 7
  • 29
  • 47
  • When I copy/paste that in (and fix the typo on line 5), it results in what seems to be a correct result - "amrut=ac.hrworks@gmail.com". – Ren Feb 12 '13 at 05:45
  • 3
    Consider using String.replace instead of replaceAll. It does exactly what I believe you expected replaceAll to do. – Buhb Feb 12 '13 at 06:58
  • 3
    My eyes hurt from seeing code like that. – user Feb 12 '13 at 13:28

6 Answers6

45

You'll note in the Javadoc for String.replaceAll() that the first argument is a regular expression.

A period (.) has a special meaning there as does a pipe (|) as does a curly brace (}). You need to escape them all, such as:

email = email.replaceAll("gmaii\\.com", "gmail.com");
Brian Roach
  • 76,169
  • 12
  • 136
  • 161
14

(Is this Java?)

Note that in Java, replaceAll accepts a regular expression and the dot matches any character. You need to escape the dot or use

somestring.replaceAll(Pattern.quote("gmail.com"), "replacement");

Also note the typo here:

email = emai.replaceAll("canear", "career");

should be

email = email.replaceAll("canear", "career");
David M. R.
  • 313
  • 1
  • 7
6

You have to escape . by \\.like following :

if (email != null) {
    email = email.replaceAll(" ", "");
    email = email.replaceAll("caneer", "career");
    email = email.replaceAll("canaer", "career");
    email = email.replaceAll("canear", "career");
    email = email.replaceAll("caraer", "career");
    email = email.replaceAll("carear", "career");
    email = email.replace("|", "l");
    email = email.replaceAll("}", "j");
    email = email.replaceAll("j3b", "job");
    email = email.replaceAll("gmaii\\.com", "gmail.com");
    email = email.replaceAll("hotmaii\\.com", "hotmail.com");
    email = email.replaceAll("\\.c0m", "com");
    email = email.replaceAll("\\.coin", "com");
    email = email.replaceAll("consuit", "consult");
}
return email;
Sumit Singh
  • 15,743
  • 6
  • 59
  • 89
6

By realizing that replaceAll() first argument is regex you can make your comparisons much less

For example you can check for possible misspellings of the word career by the following regex

email = email.replaceAll("ca[n|r][e|a][e|a]r", "career"));

iTech
  • 18,192
  • 4
  • 57
  • 80
5

You are using some regex characters.

Please escape them using \ or by using Pattern.quote method

Thihara
  • 7,031
  • 2
  • 29
  • 56
5

I think you are not aware that first parameter of replaceAll is regex.

. , |, } might be interpreted in a different way from your expectation.

.   Any character (may or may not match line terminators)

http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html

For space you better use

\s  A whitespace character: [ \t\n\x0B\f\r]

and escape other special characters with a leading \\

Nikolay Kuznetsov
  • 9,467
  • 12
  • 55
  • 101