using egrep for regular expression check on String

Question

I have to validate a String against a regular expression for a date format YYYYMMddhhmmss.

I have tested the below code:

temp=echo $file_timestamp | egrep '^(20)[0-9][0-9](0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])(0[0-9]|1[0-9]|2[0123])([0-5])[0-9]([0-5])[0-9]$';

The following returns the content of file_timestamp if it satisfies the pattern, else returns null to the variable temp.

Is this code snippet optimized per unix standard?

please clarify your question - what do you mean by `validate my understanding`? — Nogard, Dec 26 '12 at 14:49
Wanted to know whether the code snippet is optimized and as per unix standard. Are there any other optimized approach for the above? — Sumit Sahu, Dec 26 '12 at 14:52

score 0 · Answer 1 · answered Dec 26 '12 at 22:21

There won't be a unix standard that explicitly addresses your question.

Also standards, versus good working code are sometimes at odds (not that often).

I can think of at least 3 issues with code, even reg expressions, that as a developer I want to make are covered.

are the results correct? Only you can know this. Building your code as test-driven development isn't something that just java-people can do. Make a file with the range of inputs you think you should support, and make sure the output is working for all cases, AND if this is really a big project, have error messages showing what wasn't processed.
Is it maintainable? A block of comments detailing how you think the regular expression is working will be helpful to those that come after you, OR even for yourself 6 months from now, and you haven't looked at a reg-exp since.
Performance. Is there an alternate "phrasing" of the regular expression that still gives a correct answer, but "runs" faster?

Taking your reg-exp, I think I would have done it differently, given your definition. Note that this version is shorter, so by a very simple metric, it's easier to maintain.

temp=$(echo $file_timestamp \
| egrep '^20[0-9][0-9][0-1][0-9][0-3][1-9][0-2][0-9][0-5][0-9][0-5][0-9]$'
)

Unless you care to explain your use of ( .... ) grouping characters, I don't see any use for them.

(The only thing I can think of is that your specified format YYYYMMddhhmmss is more flexible than you indicate. You're not trying to match any format of date the user might put it, i.e. YYYYMdhms (when there is a leading zero for any of the element?. Beware, this is a path to madness and incorrect data going into your system! ;-)

Finally, you don't indicate how you use $temp in your validation. I think a much simpler way to validate an existing variable (and more flexible) is to use a case statement. Try

 case ${file_timestamp) in
     20[0-9][0-9][0-1][0-9][0-3][1-9][0-2][0-9][0-5][0-9][0-5][0-9] )
       print -u2 -- "dbg: valid : file_timestamp=${file_timestamp}"
       # do other good stuff here
     ;;
     * )
       print -u2 -- "dbg:NOT valid : file_timestamp=${file_timestamp} "
       # do other error reporting or fixing here
     ;;
   esac

Now your avoiding extra processes for the $( ... ) and the grep.

If you need your grouping chars like (2[0-9]), then you'll have to use grep (sed, awk, etc) as ksh regexps don't support () grouping (I'm almost sure).

IHTH

Is there any inbuilt Unix function to test whether a date entered is a valid date or not. Through Regular Expression i can limit the date to the pattern but i am unable to restrict the user from giving certain values like e.g.30 Feb. — Sumit Sahu, Dec 27 '12 at 15:03
No, sorry, no inbuilt function for date validations. Linux versions of the date command can convert many input date formats to different output formats, but require specifying the expected format, (essentially a regular expression). You can of course chain multiple tests together and have a test for `30 Feb` as needed. So now I think I understand the intent of your original question. If you want to reject illegal date values, like `30 Feb`, the easiest thing to to use specfic reg-exp in the case statement above that will flag the error. They would go before the 'Good date regex`. — shellter, Dec 27 '12 at 15:14
There are many questions here on S.O. already about converting and validating dates. you should search for `[linux] date convert` or `[linux] date validate`, substituting `[unix]` for `[linux]` if appropriate. If you find my answer has pointed you in the right direction to solving your problem, please accept it as a correct answer. Good luck. — shellter, Dec 27 '12 at 15:15
I tried formatting using date -d , but i get "illegal option" — Sumit Sahu, Dec 27 '12 at 15:27
The only way I know to get `-d $date` functionality, is from the GNU date utility, but You'll get a better answer posting a question at http://serverfault.com/ like "I need the GNU date functionality of `date -d ${date}` on my Sun OS ver XXX? server. What packages do I need to install? Or is there another way to get this functionality". (I'm not a sysadmin, so I only know vaguely about this stuff). Note that you'll likely require root access to have this installed. For your future questions, please add a tag for SunOS, as this greatly affects the usefulness of answers you will get. Good luck. — shellter, Dec 28 '12 at 17:01

using egrep for regular expression check on String

1 Answers1