0

I'm looking for a regexp formula that finds strings starting with a dash and ending with a dash or a point, it order to manually evaluate cases where dashes must be replaced with em-dashes.

For example, the below text:

-hi there.
-hello-.
It's nice -said while looking at the window- if you could come.

Needs to be replaced with

—hi there.
—hello—.
—good morning —he said.

But this dashes must remain unchanged:

1992-1994
MTS-O

Since I dont think a fully automated solution is posible, i'm looking to speed up the manual review with a single regexp that replaces these two:

–(.+?)–
–(.+?)\.

With one that match a dash or a point at the end, and let me do a fast substitution that conditionally replace the en dash, when that is matched or keeps the point, if thats matched.

manuhank
  • 11
  • 3

2 Answers2

2

Maybe you can settle with a simple pattern as suggested. But that might cause problems with some edge-cases. It needs a little more to fulfill all your requirements.

..a regexp formula that finds a string starting with a dash and ending with a dash or a point,

However, if you want to do it in one go you may need a PCRE pattern like this: Demo

(?=^-.*[.-]$)-|\G(?!^).*\K-

First, verify the whole string with a lookahead: (?=^-.*[.-]$). If we've a match we are at position 1.

Then, we look for the first dash to replace it, followed by a \G-continue alternative to match subsequent dashes that are not at the starting position (?!^). We skip ahead to the next - with .* and use \K to drop everything before it. Fun, right?

In general, I would suggest using two regexes. First to find/verify the pattern in question, and then do the replacement. But that is probably not an option in your environment.

wp78de
  • 18,207
  • 7
  • 43
  • 71
0

My guess is that, maybe these simple expressions,

(?=-)-

or more accurately for ending with .:

(?=-.*\.$)-

with a simple replacement of might work.

Demo

Community
  • 1
  • 1
Emma
  • 27,428
  • 11
  • 44
  • 69