Using regex to get rid of newlines in addresses

Question

I have a set of address data which had three main columns policy number, address and index number. In the middle of some of the addresses there are new lines which I want to get rid of. But I don't want to get rid of the new lines separating each data row. I am using textpad and trying to create a regular expression that can locate the specific newlines I want to delete, using a search and replace.

Each index number is a random number followed by "_CDB," so I have been trying to create a regular expression that deletes all newlines that are not preceded by "_CDB." So my current expression uses a lookbehind which looks like the following (?<!_CDB)\n, but it still seems to be locating every new line rather than just those that are not preceded by "_CDB."

It would be very good if someone could suggest where I am going wrong or suggest another way of eliminating these newlines in the middle of addresses.

Thanks

Can you show a sample of your file? – Jorge Campos Jun 15 '16 at 01:10 — Jorge Campos, Jun 15 '16 at 01:10

score 1 · Answer 1 · answered Jun 15 '16 at 02:13

Description

Your probably getting hung up on lines that have spaces at the end of the line. I'd simply match all the return characters and capture _CDB\n, then just replace

(_CDB\s*[\n\r]+)|[\n\r]

Replace With: $1

Regular expression visualization

Example

Live Demo

https://regex101.com/r/qT6nU8/1

Sample text

321321312, 1111 deer park road
kenosha
wi
53144, 1111_CDB
321321312, 222 deer park road
kenosha
wi
53144, 222_CDB
321321312, 333 deer park road
kenosha
wi
53144, 333_CDB
321321312, 4444 deer park road
kenosha
wi
53144, 4444_CDB

After Replacement

321321312, 1111 deer park roadkenoshawi53144, 1111_CDB
321321312, 222 deer park roadkenoshawi53144, 222_CDB
321321312, 333 deer park roadkenoshawi53144, 333_CDB
321321312, 4444 deer park roadkenoshawi53144, 4444_CDB

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    _CDB                     '_CDB'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    [\n\r]+                  any character of: '\n' (newline), '\r'
                             (carriage return) (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  [\n\r]                   any character of: '\n' (newline), '\r'
                           (carriage return)
----------------------------------------------------------------------

Using regex to get rid of newlines in addresses

1 Answers1

Description

Example

Explanation