The Regex is intended for use in a Text-to-Speech program, which struggles with works that include stretched words or pronouncable-ish scenebreaks, like AAAAAAAAARRRRRRGGHHHH!!!! or XXXXXXXXXXXXXXXXXX, which while reading isn't an issue, the text to speech ends up reading out each individual letter after giving up on pronounciation.
The Text-To-Speech has a pronunciation-adjustment which supports regex, as a simple find and replace is inadequate.
The regex needs to find any string of characters that repeats 3 or more times, but only actually select (and hence replace) the third and more instances of such.
https://regex101.com/r/Z6zVOg/2
The best I have managed is this
(?|(?'a'.*)\k'a'(\1))\1
I have a number of sample lines, each of which should match the following line, but only some of them seem to work,
The quick brown fox jumps over the lazy lazy lazy dog.
The quick brown fox jumps over the lazy lazy dog.
Attack Attack Attack Attack Attack Attack
Attack Attack
Attack!!!!!!
Attack!!
WAAAAAAGGGGGGHHHHHH!!!
WAAGGHH!!
Attack Whatever Attack Attack
Attack Whatever Attack Attack
The quick brown fox jumps over the lazy lazy lazy dog.
The quick brown fox jumps over the lazy lazy dog.
Attack Attack
Attack Attack
Attack!!
Attack!!
WAAGGHH!!!
WAAGGHH!!
Attack Whatever Attack Attack
Attack Whatever Attack Attack
This This This Friend Friend Friend
This This Friend Friend
Edit: While both given solutions do indeed work in Regex101, they don't appear to work in @voice, so I am currently trying to figure out exactly which variant of regex it uses.
- regex101.com/r/zM2qfC/1 ? – jhnc Just removes all matching text.
- variant: regex101.com/r/HJ1jmN/1 – jhnc Error message, Unrecognized backslash escape sequence near index 16.
Solution - jhnc (.+?) *\1( *\1)+ $1$2 Just needed to add the replacement, which I didn't notice before.