0

I want to test for string corruption and it would be easiest to use a regex that finds violations of a repeated sequence of characters. Suppose the sequence used for validating integrity is 0 to 9.

For example, '0123456790123456789' would match '79' because '8' is missing.

'01234567555890123456789' would match '75558' because '555' doesn't belong. The specific string returned isn't important, just the flagging of at minimum the first location of corruption.

How can I achieve this with a regular expression, if this is even possible with a regex?

Jeff Axelrod
  • 27,676
  • 31
  • 147
  • 246
  • Please use more words and explain more clearly what it is you are looking for. Show sample data. Also, don't assume that the answer to your question is going to use a regular expression. Regexes are not a magic wand you wave at every problem that happens to involve strings. – Andy Lester Feb 18 '15 at 16:28

1 Answers1

1

You could spell out the sequence like this:

(0[^1]|1[^2]|2[^3]|3[^4]|4[^5]|5[^6]|6[^7]|7[^8]|8^[9]|9[^0])

In 01234567555890123456789 it returns two matches 75 and 55 - it will at least reliably find the first broken link.

I don't think much more can be done in regex alone since regex cannot do a sorted comparison in itself.

KekuSemau
  • 6,830
  • 4
  • 24
  • 34
  • No need for sorting, really finding at least one occurrence is sufficient. – Jeff Axelrod Feb 18 '15 at 16:58
  • Yeah, with sorting I was thinking that the definition of a sequence is a set of rules what element follows which other element. Regex doesn't know that even for 0-9 / a-z etc., so it cannot handle sequences, unless with explicit atomic rules like I tried here. – KekuSemau Feb 18 '15 at 17:07