regular expressions, text wrangler, inserting line breaks in certain places

Question

I have a large file. Most lines are like this (record number dot space last name, first name)

1. Moore, Roger
2. Connery, Sean
3. ....
100. Dalton, Timothy

.. Occasionally some unpleasant lines are like this

110. Bronson, Pierce  111. Gomez, Selena 112. Portman, Nathalie

I need a regular expression to break those unpleasent lines to like this

110. Bronson, Pierce  
111. Gomez, Selena 
112. Portman, Nathalie

Some lines may have two records, but some may have five or more records like that. How did I get them, when I copy/paste pdf document into Textwrangler some lines come up like that. I use text wrangler.

you could insert a newline in front of every `\d+` that is not anchored by a `^`. — Shawn Mehan, Feb 17 '18 at 20:40
I am really a novice, could you please type up your reply as a command that I could enter in search replace box text wrangler? — mysql_python-connect_user, Feb 17 '18 at 20:51

score 1 · Accepted Answer · answered Feb 17 '18 at 20:56

1

I haven't used Text Wrangler in years, but it has regex capabilities. You need to Find and Replace with a regex.

Here is a working regex that shows the identification of all the lines with extra numbered entries.

You want to replace what it matches with something like

\n$1

where the \n is a newline character and the $1 is the text captured in the match, so it should result in

Bronson, Pierce 111. Gomez, Selena 112. Portman, Nathalie

going to

Bronson, Pierce

Gomez, Selena

Portman, Nathalie

answered Feb 17 '18 at 20:56

Shawn Mehan

4,513
9
31
51

Thanks lot; it worked; I typed ( )(\d+) in the search box and typed \n\2 in the replace box. – mysql_python-connect_user Feb 17 '18 at 21:19
good. you should close this question out by checking the green tick on this answer. good luck. – Shawn Mehan Feb 17 '18 at 21:27

score 0 · Answer 2 · answered Feb 17 '18 at 20:56

0

Regex: +(?=\d+\.) or \s+(?=\d+\.) Substitution: \n

Details:

\s Matches any whitespace character (equal to [\r\n\t\f\v ])
+ Matches between one and unlimited times
(?=) Positive Lookahead
\d matches a digit (equal to [0-9])

answered Feb 17 '18 at 20:56

Srdjan M.

3,310
3
13
34

regular expressions, text wrangler, inserting line breaks in certain places

2 Answers2