2

I am attempting to clean up a flat file in a C-ISAM database that has corrupt data. The flat file is fixed position delimited, and the field I'm trying to clean up is a date field. The field uses an eight digit format (YYYYMMDD). What I am trying to do is use a text editor with regex capabilities to find any row where this column does not contain eight digits and replace it with a null value.

Because the true data that would have been there is irrecoverable and because the conversion scripts throw away invalid dates, the date does not need to be valid, only the format. In other words, 20160512 and 99999999 would be valid values, but all blanks is not, nor would "   123  ".

I don't have a great grasp of RegEx because I end up working with it so infrequently, but I'm most of the way there:

(?<=^.{1018})[^\d]{8}

This skips the first 1018 columns in a row and selects the next 8 characters if there are no digits within them. This catches the first 'not valid' example I gave but not the second. I need one that does both.

K_Cruz
  • 729
  • 6
  • 17
  • Well, looks like you just need `^(.{1018})(?!\d{8}).{8}` and replace with `$1` (or `\1` depending on your regex flavor). See the [demo](https://regex101.com/r/aH8wB5/2). What tool are you going to use? – Wiktor Stribiżew May 12 '16 at 20:31

1 Answers1

0

It looks like you need

^(.{1018})(?!\d{8}).{8}

Replace with $1 (or \1 depending on your regex flavor).

See a simplified demo.

Pattern details:

  • ^ - start of line/string
  • (.{1018}) - Group 1 matching 1018 characters other than a newline
  • (?!\d{8}) - a negative lookahead failing the match if the following 8 characters are digits
  • .{8} - match, consume, 8 characters that are not all digits.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563