9

I want to check if arguments passed in stdin to see if they conform to a valid java package name. The regex I have is not working properly. With the following code passing in com.example.package I receive the error message. I'm not sure what is wrong with my regex?

 regex="/^[a-z][a-z0-9_]*(\.[a-z0-9_]+)+[0-9a-z_]$/i"
 17         if ! [[ $1 =~ $regex ]]; then
 18                 >&2 echo "ERROR: invalid package name arg 1: $1"
 19                 exit 2
 20         fi
Whoppa
  • 949
  • 3
  • 13
  • 23
  • 2
    Do you care for keywords? Invalid package names like "package" or "com.example.class" would be hard to detect with a simple regex, I'd guess. Or do you just need the very basic syntax check (if so you might want to know that upper case characters are actually valid)? – Marvin Apr 21 '15 at 21:27

4 Answers4

11

You are pretty close to the correct solution. Just tweak the regex a bit (also consider @fede's simpler regex) and set the nocasematch option for case insensitive matching. For example:

regex='^[a-z][a-z0-9_]*(\.[a-z0-9_]+)+[0-9a-z_]$'

shopt -s nocasematch
if ! [[ $1 =~ $regex ]]; then
  exit 2
fi
shopt -u nocasematch

You are probably being misled by other languages that use /regex/i (javascript) or qr/regex/i (perl) for defining a case-insensitive regex object.

Btw, Using grep -qi is another, more portable, solution. Cheers.

gvalkov
  • 3,977
  • 30
  • 31
  • Package "a" doesn't match. – Marboni Dec 05 '19 at 01:22
  • @Marboni I think maybe it's nicer like this: "^[a-z][a-z0-9_]+(\.[a-z0-9_]*)*$", although I prefer it like this (in Java): "(?x) [a-z][a-z0-9_]* ( \\. [a-z][a-z0-9_]* ) *". The (?x) flag makes it a lot easier to read regexes. – Jonathan Locke Mar 09 '21 at 03:22
  • 2
    a package is allowed to start with "_", upper case letters are allowed – bline May 19 '21 at 08:00
7

You could use a simpler regex like this:

(?:^\w+|\w+\.\w+)+$

Working demo

Federico Piazza
  • 30,085
  • 15
  • 87
  • 123
2

As per Java package naming conventions, the basic syntax

  • allows only lower case English alphabets,numbers, _ and .
  • must start with alphabet
  • can't be malformed eg: contains .. | endsWith . | contains Java keywords

Ignoring the keyword constraint, the regex can be

^[a-z][a-z0-9_]*(\.[a-z0-9_]+)*[a-z0-9_]*$

//regex breakdown
//^[a-z]            start with one lowercase English alphabet
//[a-z0-9_]*        followed by zero or more of alphabets, numbers or _
//(\.[a-z0-9_]+)*   zero or more of these
//                          .(one or more of alphabet|number|_)
//[a-z0-9_]*$       must end in alphabet|number|or _

try

This differs from @gvalkov's answer by

  • allowing the number of word.word constructs to be zero
  • augmenting last char check with * to correctly detect last char even in presence of word.word matches
lineage
  • 792
  • 1
  • 8
  • 20
1

It's probably a better approach

^[A-Za-z_][A-Za-z0-9_]*(?:\.[A-Za-z_][A-Za-z0-9_]*)*$
Sina Salmani
  • 89
  • 1
  • 9
  • This is the only correct solution, Java package names may contain mixed cases, though they _should_ be all lower case. Alos his pattern does not allow package names like `2pak.is.invalid` – Udo Klimaschewski Dec 05 '22 at 14:19