0

I have some fairly dirty text tab data that I need to import in SQL Server. It's failing because one of the column delimiters goes out of whack. This is because double quoting is occuring in some of the columns. I'm struggling to put it right, my theory would be to remove all quotes and then come up with a regex to wrap each tab seperated field again with single quotes. Despite scouring lots of similar stack posts I can't find an answer that works, can anyone help?

It doesn't have to be Notepad++ but it's what I'm most familiar with.

    "REDACTED"  "REDACTED"  REDACTED    "REDACTED"  "REDACTED"  "REDACTED"  ""  "REDACTED"    "REDACTED"    "REDACTED"  ""Problematic, field""  ""  ""  ""  "REDACTED"  "REDACTED"  "REDACTED"  ".00"   "805400838" 94.17   22.77   4.13    ".83"   117.95  220 2   0   "REDACTED"
PocketTim
  • 87
  • 10
  • 1
    What is the expected result? – Toto Dec 11 '19 at 12:29
  • Every string must be wrapped in single quotes even if it's blank. I've edited my post so in the example you can see where double quotes have appeared around a field, likely due to the supplier using " as an escape. If a value or date ends up wrapped in quotes that's not the end of the world as I can fix that post import. – PocketTim Dec 11 '19 at 12:33
  • 1
    So, you wan to replace all double quotes in single quotes, what about spaces or word without quote? It's not clear to me, please, simplify a bit your example and **add expected result**. – Toto Dec 11 '19 at 12:44

1 Answers1

0

Without further explanations on your demand, I guess you want to remove all double quotes and add single quotes arround each field.

This does the job:

  • Ctrl+H
  • Find what: (?<=\s)""(?=\s|$)|(?<=\s)([^"\s]+)|(?<!")("+)([^"]+)\2
  • Replace with: (?1'$1':'$3')
  • CHECK Wrap around
  • CHECK Regular expression
  • Replace all

Explanation:

  (?<=\s)           # positive lookbehind, make sure we have a space before
  ""                # 2 double quote
  (?=\s|$)          # positive lookahead, make sure we have aspace after or end of line
|               # OR
  (?<=\s)           # positive lookbehind, make sure we have a space before
  ([^"\s]+)         # group 1, 1 or more any characte that is not space or double quote
|               # OR
  (?<!")            # negative lookbehind, make sure we haven't a double quote before
  ("+)              # group 2, 1 or more double quote
  ([^"]+)           # group 3, 1 or more non double quote
  \2                # reference to group 2, same number of double quote

Replacement:

(?1             # conditional replacement, if group 1 exists
  '$1'              # content of group 1 surrounded with single quotes
 :              # else
  '$3'              # content of group 3 surrounded with single quotes
)               # end condition

Screen capture (before):

enter image description here

Screen capture (after):

enter image description here

Toto
  • 89,455
  • 62
  • 89
  • 125