The general problem
I am trying to understand how to prevent the existence of some pattern before or after a sought-out pattern when writing regex's!
A more specific example
I'm looking for a regex that will match dates in the format YYMMDD ((([0-9]{2})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1]))
) inside a long string while ignoring longer numeric sequences
it should be able to match:
- text151124moretext
- 123text151124moretext
- text151124
- text151124moretext1944
- 151124
but should ignore:
- text15112412moretext (reason: it has 8 numbers instead of 6)
- 151324 (reason: it is not a valid date YYMMDD - there is no 13th month)
how can I make sure that if a number has more than these 6 digits, it won't picked up as a date inside one single regex (meaning, that I would rather avoid preprocessing the string)
I've thought of \D((19|20)([0-9]{2})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1]))\D
but doesn't this mean that there has to be some character before and after?
I'm using bash 3.2 (ERE)
thanks!