0

I'm trying to use a RegEx that will catch any pre 1980's VIN (Vehicle Identification Numbers). Anythign pre 1980's will not be required to be 17 digits and can contain 'O's and 'U's.

This site claimed to have a RegEx for it (it also explains the VIN rules): http://lamptricks.blogspot.com/2012/03/vin-regex-pre-1980-and-new.html

Here's the RegEx: ^((([a-h,A-H,j-n,J-N,p-z,P-Z,0-9]{9})([a-h,A-H,j-n,J-N,p,P,r-t,R-T,v-z,V-Z,0-9])([a-h,A-H,j-n,J-N,p-z,P-Z,0-9])(\d{6}))|(([a-h,A-H,j-z,J-Z,0-9]{6,11})(\d{5})))$

But the following VIN did not pass the test: BCG23253

It ends and 5 digits and is 8 characters long-- which thie RegEx accounts for... Is this VIN just faulty or is the RegEx?

mint
  • 3,341
  • 11
  • 38
  • 55
  • 1
    It says older VINs must be 11-16 chars. A length of 8 doesn't seem valid. See this part: `(([a-h,A-H,j-z,J-Z,0-9]{6,11})(\d{5}))` It requires 6-11 chars before 5 digits. – Wiseguy Mar 30 '12 at 19:03
  • @Wiseguy Good point... maybe it's bad data then. – mint Mar 30 '12 at 19:04
  • As an aside, for readability I would remove the duplicated ranges of letters (one for lowercase, one for uppercase -- just keep one or the other) and make it case-insensitive however that's an option in your regex engine. – Wiseguy Mar 30 '12 at 19:07
  • We need more information here. Please describe: a) The VINs you're looking for, b) the VINs you're NOT looking for, and c) the file or string they're contained in. Is the file full of VINs and nothing else? Are they separated by spaces? Commas? We need **specific** rules if we're going to write a regex to satisfy them. – Justin Morgan - On strike Mar 30 '12 at 20:35
  • Also, what flavor of regex are you using? – Justin Morgan - On strike Mar 30 '12 at 20:37
  • @JustinMorgan Op asked why the provided VIN did not match the provided RegEx. Not for you to write a RegEx for him. – Madbreaks Mar 30 '12 at 22:13
  • @Madbreaks - My answer below explains why the VIN doesn't match the provided regex. My take on this is that he's asking for help solving his problem; but no matter what he wants us to do, we need more specifics in order to help him. You can ignore the last sentence in my first comment if you prefer. – Justin Morgan - On strike Mar 31 '12 at 06:58
  • Thanks for the help everyone. Like you all said, it comes down to the fact that the data I'm expecting is going to be flawed and not cater the rules of RegEx's. So it is impossible to write a VIN RegEx if the input is not going to follow the 'VIN' rules. – mint Apr 02 '12 at 14:43

2 Answers2

2

That RegEx is hard to read, but look at this, after the or operator:

(([a-h,A-H,j-z,J-Z,0-9]{6,11})(\d{5})))$

...says "between 6 and 11 of the preceding group, followed by 5 digits". Your sample VIN does not meet that criteria.

Madbreaks
  • 19,094
  • 7
  • 58
  • 72
  • 1
    I suspect that you want to remove the commas between the square brackets. They repetitively match a literal comma. (I tried submitting this as an edit, but the Stackoverflow software rejected it because it changed fewer than six characters! Anyway, you can edit it, yourself.) – thb Mar 30 '12 at 19:13
  • @thb It's a direct copy/paste of op's RegEx. The issue is not the commas, it's that the sample VIN is invalid. – Madbreaks Mar 30 '12 at 19:25
  • 1
    @Madbreaks While the commas don't cause an error per se, they do allow for a character that should not be allowed. They are neither necessary nor desired. – Wiseguy Mar 30 '12 at 19:34
  • @Wiseguy Again, not my RegEx. What's the confusion here? – Madbreaks Mar 30 '12 at 19:43
  • No confusion. At a minimum, you did answer why the asker's data didn't match -- that was the question. And I upvoted you for it. Your answer is not "wrong" for keeping the commas, but you would be doing a greater service if you also highlighted other potential errors with it, offering a more correct/useful answer. – Wiseguy Mar 30 '12 at 19:51
  • @Wiseguy Fair enough. One thing I noted was that a requirement of the given RegEx was that it be OS- and engine-independent. I cannot say for certain that removing the commas would retain the intended effect given that requirement. Can you? Sincere question. – Madbreaks Mar 30 '12 at 19:57
  • This is anecdotal, of course, but no regex engine I've seen would use commas like that; they all would treat that as a literal comma. Supposing there is a regex engine that requires such commas, the commas would be treated differently (as an accepted character) in most other engines and thus would fail at its goal of engine independence. For that matter, it already fails at that goal because POSIX doesn't support `\d` shorthand for digits. – Wiseguy Mar 30 '12 at 20:09
2

First of all, the regex you found needs some work. I think the author doesn't understand what commas mean inside character classes, for one thing. If you ignore the needless commas and the capture groups, you can simplify the whole thing to this:

/^([a-hj-mp-z0-9]{9}[a-hj-mp-rtv-z0-9][a-hj-mp-z0-9]\d{6}|[a-hj-z0-9]{6,11}\d{5})$/i

...and then further, depending on your regex engine:

/^((?!.{9}[su])[a-z0-9-[io]]{11}\d{6}|[a-hj-z0-9]{6,11}\d{5})$/i

That being said, the number you gave (BCG23253) doesn't satisfy the requirement because it's only 8 characters long. To satisfy the bare minimum requirements (the [a-hj-z0-9]{6-11}\d{5} part above), your input would have to be 11-17 characters long, end in 5 numbers, and not include the letter I.

So BCG23253 shouldn't pass, but BCGBCG23253 (for example) would. As I said in my comment above, I think we need more information about the specific matches you're looking for. It sounds to me like the regex you've posted is made for matching post-1980 VINs, not pre-1980 VINs. Either that, or BCG23253 isn't a valid VIN after all.

Justin Morgan - On strike
  • 30,035
  • 12
  • 80
  • 104