1

I am using Notepad++ (version 6.9.2)

I have a fixed width flat file database that I need to re-arrange and cleanup.

Example of file:

0001168500Mel     Maenyey Legal College                                                                       91PO Box 867                                                                                          Lutghjhe                                          403003Jellisa Moneyey                                             07 3660 2551        07 3660 2211        Jellisa@gglegalcollege.com.nz                                                   

Here is my current 'find' regex:

^(.{10})(.{100})(.{2})(.{50})(.{50})(.{50})(.{4})(.{2})(.{60})(.{20})(.{20})(.{80})

and here is my 'replace' regex:

"$1","$2","$3","$4","$5","$6","$7","$8","$9","$10","$11","$12"

These work well however I would like to:

  1. not copy over the padding spaces (right or left)
  2. Remove left '0' (zeros) from numerical values - none in this example file but there are some in other files.

I have searched and can't find examples of where this might be possible in the same regex. I could do another regex after but if there is something quick and easy to add to this one it would be nice ;-)

Scott Spence
  • 99
  • 14
  • 2
    It's easier if you do it in multiple steps. After you've applied the regex you have and everything's neatly separated by commas, you can just replace `"[ 0]+| +"` with `"` – Aran-Fey Jul 24 '16 at 05:53
  • I think you can easily add these post-processing in a Python script. If you are interested, I will "cook" one when I have a spare minute. – Wiktor Stribiżew Jul 24 '16 at 08:00

1 Answers1

0

Your current regex yields this output:

"0001168500","Mel     Maenyey Legal College                                                                       ","91","PO Box 867                                        ","                                                  ","Lutghjhe                                          ","4030","03","Jellisa Moneyey                                             ","07 3660 2551        ","07 3660 2211        ","Jellisa@gglegalcollege.com.nz                                                   "

The problem with a regex approach is that you have fixed width fields, and the number of possible leading zeros and leading/trailing whitespaces is variable. Although in some cases it can be helped with possessive matching and lookbehinds, I'd recommend to use a more flexible, code based approach.

Install the working PythonScript version, and create a script file with the following contents:

import re
def repl(match): 
    return '"{0}","{1}","{2}","{3}","{4}","{5}","{6}","{7}","{8}","{9}","{10}","{11}"'.format(
        digit_reg.sub(r'\1', match.group(1).strip()),
        digit_reg.sub(r'\1', match.group(2).strip()),
        digit_reg.sub(r'\1', match.group(3).strip()),
        digit_reg.sub(r'\1', match.group(4).strip()),
        digit_reg.sub(r'\1', match.group(5).strip()),
        digit_reg.sub(r'\1', match.group(6).strip()),
        digit_reg.sub(r'\1', match.group(7).strip()),
        digit_reg.sub(r'\1', match.group(8).strip()),
        digit_reg.sub(r'\1', match.group(9).strip()),
        digit_reg.sub(r'\1', match.group(10).strip()),
        digit_reg.sub(r'\1', match.group(11).strip()),
        digit_reg.sub(r'\1', match.group(12).strip())
    )

pat = r'^(.{10})(.{100})(.{2})(.{50})(.{50})(.{50})(.{4})(.{2})(.{60})(.{20})(.{20})(.{80})'
digit_reg = re.compile(r'^0+([0-9]*)$')
editor.rereplace(pat, repl)

The pattern you have will be used to match separate fields. The match data object will be passed to the repl method, where each of the fields will be trimmed with .strip() method from the leading/trailing whitespaces, and the ^0+([0-9]*)$ regex will be applied to all the fields to remove leading zeros in strings that only consist of digits.

enter image description here

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563