A Java valid identifier is:
- having at least one character
- the first character MUST be a letter
[a-zA-Z]
, underscore _
, or dollar sign $
- the rest of the characters MAY be letters, digits, underscores, or dollar signs
- reserved words MUST not be used as identifiers
- Update: as single underscore
_
is a keyword since Java 9
A naive regexp to validate the first three conditions would be as follows: (\b([A-Za-z_$][$\w]*)\b)
but it does not filter out the reserved words.
To exclude the reserved words, negative look-ahead (?!)
is needed to specify a group of tokens that cannot match:
\b(?!(_\b|if|else|for|float|int))([A-Za-z_$][$\w]*)
:
- Group #1:
(?!(_\b|if|else|for|float|int))
excludes the list of the specified words
- Group #2:
([A-Za-z_$][$\w]*)
matches identifiers.
However, word border \b
consumes dollar sign $
, so this regular expression fails to match identifies starting with $
.
Also, we may want to exclude matching inside string and character literals ("not_a_variable", 'c', '\u65').
This can be done using positive lookbehind (?<=)
to match a group before main expression without including it in the result instead of the word-border class \b
:
(?<=[^$\w'"\\])(?!(_\b|if|else|for|float|int))([A-Za-z_$][$\w]*)
Online demo for a short list of reserved words
Next, the full list of the Java reserved words is as follows, which can be collected into a single String of tokens separated with |
.
A test class showing the final pattern for regular expression and its usage to detect the Java identifiers is provided below.
import java.util.Arrays;
import java.util.List;
import java.util.regex.MatchResult;
import java.util.regex.Pattern;
public class IdFinder {
static final List<String> RESERVED = Arrays.asList(
"abstract", "assert", "boolean", "break", "byte", "case", "catch", "char", "class", "const",
"continue", "default", "double", "do", "else", "enum", "extends", "false", "final", "finally",
"float", "for", "goto", "if", "implements", "import", "instanceof", "int", "interface", "long",
"native", "new", "null", "package", "private", "protected", "public", "return", "short", "static",
"strictfp", "super", "switch", "synchronized", "this", "throw", "throws", "transient", "true", "try",
"void", "volatile", "while", "_\\b"
);
static final String JAVA_KEYWORDS = String.join("|", RESERVED);
static final Pattern VALID_IDENTIFIERS = Pattern.compile(
"(?<=[^$\\w'\"\\\\])(?!(" + JAVA_KEYWORDS + "))([A-Za-z_$][$\\w]*)");
public static void main(String[] args) {
System.out.println("ID pattern:\n" + VALID_IDENTIFIERS.pattern());
String code = "public class Main {\n\tstatic int $1;\n\tprotected char _c0 = '\\u65';\n\tprivate long c1__$$;\n}";
System.out.println("\nIdentifiers in the following code:\n=====\n" + code + "\n=====");
VALID_IDENTIFIERS.matcher(code).results()
.map(MatchResult::group)
.forEach(System.out::println);
}
}
Output
ID pattern:
(?<=[^$\w'"\\])(?!(abstract|assert|boolean|break|byte|case|catch|char|class|const|continue|default|double|do|else|enum|extends|false|final|finally|float|for|goto|if|implements|import|instanceof|int|interface|long|native|new|null|package|private|protected|public|return|short|static|strictfp|super|switch|synchronized|this|throw|throws|transient|true|try|void|volatile|while|_\b))([A-Za-z_$][$\w]*)
Identifiers in the following code:
=====
public class Main {
static int $1;
protected char _c0 = '\u65';
private long c1__$$;
}
=====
Main
$1
_c0
c1__$$