Perl Regex for Substituting Any Character

Question

Essentially, I want to replace the u between the random character and the k to be an o. The output I should get from the substitution is dudok and rujok.

How can I do this in Perl? I'm very new to Perl so go easy on me.

This is what I have right now:

$text = "duduk, rujuk";
$_ = $text;
s/.uk/ok/g
print $_; #Output: duok, ruok Expected: dudok, rujok

EDIT: Forgot to mention that the last syllable is the only one that should be changed. Also, the random character is specifically supposed to be a random consonant, not just any random character.

I should mention that this is all based on Malay language rules for grapheme to phoneme conversion.

dukuk would become dukok as the last syllable should be changed only. Though dukuk is not an actual Malay word! Haha — Asad, Dec 30 '19 at 14:07

ikegami · Accepted Answer · 2020-01-06T13:19:01.863

According to the this page, the Malayan language uses an unaccented latin alphabet, and it has the same consonants as the English language. However, its digraphs are different than English's.

ai vowel
au vowel
oi vowel
gh consonant
kh consonant
ng consonant
ny consonant
sy consonant

So, if one wanted to find a syllable ending with uk, one would look for

<syllable_boundary>(?:[bcdfhjlmpqrtvwxyz]|gh?|kh?|n[gv]?|sv?)uk

or

<syllable_boundary>uk

The OP is specifically disinterested in the latter, so we simply need to look for

<syllable_boundary>(?:[bcdfhjlmpqrtvwxyz]|gh?|kh?|n[gv]?|sv?)uk

So now, we have to determine how to find a syllable boundary. ...or do we? All the consonant digraphs end with a consonant, and none of the vowel digraphs end in a consonant so we simply need to look for

[bcdfghjklmnpqrstvwxyz]uk

Finally, we can use \b to check for the end of the word, so we're interested in matching

[bcdfghjklmnpqrstvwxyz]uk\b

Now, let's use this in a substitution.

s/([bcdfghjklmnpqrstvwxyz])uk\b/$1ok/g

or

s/(?<=[bcdfghjklmnpqrstvwxyz])uk\b/ok/g

or

s/[bcdfghjklmnpqrstvwxyz]\Kuk\b/ok/g

The last one is the most efficient, but it requires Perl 5.10+. (That shouldn't be a problem given how ancient it is.)

Updated to account for the major change to the question. (@ysth) — ikegami, Dec 30 '19 at 21:08
Or quite simply we could use [^aeiou] where it's essentially "not vowel" which is equivalent to "consonant". Correct? — Asad, Dec 31 '19 at 06:16
That match way too much. And even if it was equivalent, I doubt it would be faster — ikegami, Dec 31 '19 at 06:18

Vesa Karjalainen · Answer 2 · 2019-12-29T17:26:52.007

0

Change your regex to:

s/(.)uk/$1ok/g;

edited Dec 29 '19 at 17:26

answered Dec 29 '19 at 16:13

Vesa Karjalainen

1,087
8
15

Cool, works marvelously. Thank you so much! Can you explain a bit on what it's doing? Is it storing the character to a question mark and then we access that question mark with $1? Please help me on the proper terms. Haha – Asad Dec 29 '19 at 16:18
Fixed the obvious mistake :) – Vesa Karjalainen Dec 29 '19 at 17:28

Asad · Answer 3 · 2020-01-06T08:44:46.773

0

As ikegami raised, the word "bukuk" would have two substitutions. This is not the desired outcome as only the last syllable should be changed. Also, I forgot to mention that the change should only be done for a random consonant, u, and followed by k (e.g. ruk, not auk).

As such, taking everything into account that has been answered, the correct regex should be:

s/(\w*[bcdfghjklmnpqrstvwxyz])uk\b/$1ok/g;

EDIT: As ikegami has raised again, the complement of vowels - [^aeiou] will match for other characters like "-" and " " which is undesired. Updated the solution.

edited Jan 06 '20 at 08:44

answered Dec 30 '19 at 05:45

Asad

21
4

This is based on Malay language rules. Sorry I didn't specify about the last syllable rule in the question. I'll edit it in now. – Asad Dec 30 '19 at 14:03
Re "*Sorry I didn't specify about the last syllable rule in the question.*", That's not what I was saying; I was saying you didn't check of the match was in the last syllable. But that's wrong. I had overlooked the `\b`. Sorry, downvote and comment removed. (As well as the request to fix the question, which is now moot) – ikegami Dec 30 '19 at 20:41
Ah right, because I'm using complement of the vowels. Will update the regex then. – Asad Jan 06 '20 at 08:43
All right now that I understand the nuances a bit better, I've decided to just accept your answer to the question. Thank you so much for your input! – Asad Jan 06 '20 at 13:17

Perl Regex for Substituting Any Character

3 Answers3