-1

I have a large file. Most lines are like this (record number dot space last name, first name)

1. Moore, Roger
2. Connery, Sean
3. ....
100. Dalton, Timothy

.. Occasionally some unpleasant lines are like this

110. Bronson, Pierce  111. Gomez, Selena 112. Portman, Nathalie

I need a regular expression to break those unpleasent lines to like this

110. Bronson, Pierce  
111. Gomez, Selena 
112. Portman, Nathalie

Some lines may have two records, but some may have five or more records like that. How did I get them, when I copy/paste pdf document into Textwrangler some lines come up like that. I use text wrangler.

2 Answers2

1

I haven't used Text Wrangler in years, but it has regex capabilities. You need to Find and Replace with a regex.

Here is a working regex that shows the identification of all the lines with extra numbered entries.

You want to replace what it matches with something like

\n$1

where the \n is a newline character and the $1 is the text captured in the match, so it should result in

  1. Bronson, Pierce 111. Gomez, Selena 112. Portman, Nathalie

going to

  1. Bronson, Pierce
  2. Gomez, Selena
  3. Portman, Nathalie
Shawn Mehan
  • 4,513
  • 9
  • 31
  • 51
0

Regex: +(?=\d+\.) or \s+(?=\d+\.) Substitution: \n

Details:

  • \s Matches any whitespace character (equal to [\r\n\t\f\v ])
  • + Matches between one and unlimited times
  • (?=) Positive Lookahead
  • \d matches a digit (equal to [0-9])
Srdjan M.
  • 3,310
  • 3
  • 13
  • 34