1

Currently I am using TextWrangler (mac) with the grep find/replace, but would be just as happy to use any other editor or command line tools.

I have a text file with the structure like this (yes, there is a space at the beginning of each line):

 Reference 1 -  This is a sentence with a period. And this exclaims! So does this one!
 Reference 2 -  This questions? And this, this one responds. But this YELLS!

And I need to keep the reference, but break each sentence into its own line, like this:

 Reference 1 -  This is a sentence with a period.
 Reference 1 -  And this exclaims!
 Reference 1 -  So does this one!
 Reference 2 -  This questions?
 Reference 2 -  And this, this one responds.
 Reference 2 -  But this YELLS!

I can get it to keep the reference and the last sentence with this (copied/replaced the newline character in there, that is why the break at the end--otherwise it was matching the rest of the document):

^([^-]+ -\s+)(?:([^.!?]+?[.!?]))(([^.!?]+?[.!?])+?)$    

The replace is like this:

\1\2
\1\3

And the results look like this:

 Reference 1 -  This is a sentence.
 Reference 1 -   And this exclaims! So does this one!

 Reference 2 -  This questions?
 Reference 2 -   And this, this one responds. But this YELLS!

If I run this several times, it doesn't ever separate the other two sentences into new lines. But if I add another line in the replace:

\1\4

Then I get this as a result:

 Reference 1 -  This is a sentence.
 Reference 1 -   And this exclaims! So does this one!
 Reference 1 -   So does this one!

 Reference 2 -  This questions?
 Reference 2 -   And this, this one responds. But this YELLS!
 Reference 2 -   But this YELLS!

My hope is that this is pretty simple and I am just missing a switch/modifier/etc.

If I can do just one sentence at a time, I don't mind doing other cleaning runs.

Any ideas?

Palmtree
  • 43
  • 7

1 Answers1

2

What about:

Search:
  ^( [^-]+-\s+)(.*[.!?]) *(.*[.!?])

Replace:
  \1\2
  \1\3

I had to run it through a few times, but it seemed to match your target pattern.

OnlineCop
  • 4,019
  • 23
  • 35
  • After trying I got exactly the same thing. TextWrangler's GREP supports non-greedy matching, so you can use `(.+?[.!?])`, but it seems to come down to the same result. – Jongware Jan 25 '14 at 00:29