(Resolved) Regex to select(replace) third and further duplicates of any string(s) in a single line

Question

The Regex is intended for use in a Text-to-Speech program, which struggles with works that include stretched words or pronouncable-ish scenebreaks, like AAAAAAAAARRRRRRGGHHHH!!!! or XXXXXXXXXXXXXXXXXX, which while reading isn't an issue, the text to speech ends up reading out each individual letter after giving up on pronounciation.

The Text-To-Speech has a pronunciation-adjustment which supports regex, as a simple find and replace is inadequate.

The regex needs to find any string of characters that repeats 3 or more times, but only actually select (and hence replace) the third and more instances of such.

https://regex101.com/r/Z6zVOg/2 The best I have managed is this (?|(?'a'.*)\k'a'(\1))\1 I have a number of sample lines, each of which should match the following line, but only some of them seem to work,

The quick brown fox jumps over the lazy lazy lazy dog.
The quick brown fox jumps over the lazy lazy dog.
Attack Attack Attack Attack Attack Attack
Attack Attack 
Attack!!!!!!
Attack!!
WAAAAAAGGGGGGHHHHHH!!!
WAAGGHH!!
Attack Whatever Attack Attack
Attack Whatever Attack Attack

The quick brown fox jumps over the lazy lazy lazy dog.
The quick brown fox jumps over the lazy lazy dog.
Attack Attack
Attack Attack 
Attack!!
Attack!!
WAAGGHH!!!
WAAGGHH!!
Attack Whatever Attack Attack
Attack Whatever Attack Attack
This This This Friend Friend Friend
This This Friend Friend

Edit: While both given solutions do indeed work in Regex101, they don't appear to work in @voice, so I am currently trying to figure out exactly which variant of regex it uses.

regex101.com/r/zM2qfC/1 ? – jhnc Just removes all matching text.
variant: regex101.com/r/HJ1jmN/1 – jhnc Error message, Unrecognized backslash escape sequence near index 16.

Solution - jhnc (.+?) *\1( *\1)+ $1$2 Just needed to add the replacement, which I didn't notice before.

@Ranakastrasz – Regarding _regex101.com/r/zM2qfC/1 ? – jhnc Just removes all matching text_: Just to be sure – you didn't forget to specify `$1$2` as the substitution? — Armali, Jan 25 '23 at 07:19
What is @voice? SO is a site for programming issues, it is not meant to be a replacement for 3rd party tool support services. — Wiktor Stribiżew, Jan 25 '23 at 08:56
@Wiktor Stribizew. atvoice is an android mobile app which allows for text to speech conversion. There is an option for conditional pronunciation changes which runs on either find and replace, or regex. Once I determined (incorrectly) it wasn't a programming(?) issue, I explained that and went to contact atvoice support. — Ranakastrasz, Jan 25 '23 at 14:20

gregko · Answer 1 · 2023-01-25T16:20:58.233

0

Pattern: (.)\1{2,}

Replace: $1

This will replace for example: WAAAAAAGGGGGGHHHHHH!!! with WAGH!!! - that is any occurrence of 3 or more of the same character with just one character.

To suppress words repeated 3 or more times with just one word:

Pattern: (\b(\w+)\W*)\1{2,}\2*

Replace: $1

edited Jan 25 '23 at 16:20

answered Jan 25 '23 at 15:15

gregko

5,642
9
49
76

(Resolved) Regex to select(replace) third and further duplicates of any string(s) in a single line

1 Answers1