-2

Essentially, I want to replace the u between the random character and the k to be an o. The output I should get from the substitution is dudok and rujok.

How can I do this in Perl? I'm very new to Perl so go easy on me.

This is what I have right now:

$text = "duduk, rujuk";
$_ = $text;
s/.uk/ok/g
print $_; #Output: duok, ruok Expected: dudok, rujok

EDIT: Forgot to mention that the last syllable is the only one that should be changed. Also, the random character is specifically supposed to be a random consonant, not just any random character.

I should mention that this is all based on Malay language rules for grapheme to phoneme conversion.

Asad
  • 21
  • 4

3 Answers3

2

According to the this page, the Malayan language uses an unaccented latin alphabet, and it has the same consonants as the English language. However, its digraphs are different than English's.

  • ai vowel
  • au vowel
  • oi vowel
  • gh consonant
  • kh consonant
  • ng consonant
  • ny consonant
  • sy consonant

So, if one wanted to find a syllable ending with uk, one would look for

<syllable_boundary>(?:[bcdfhjlmpqrtvwxyz]|gh?|kh?|n[gv]?|sv?)uk

or

<syllable_boundary>uk

The OP is specifically disinterested in the latter, so we simply need to look for

<syllable_boundary>(?:[bcdfhjlmpqrtvwxyz]|gh?|kh?|n[gv]?|sv?)uk

So now, we have to determine how to find a syllable boundary. ...or do we? All the consonant digraphs end with a consonant, and none of the vowel digraphs end in a consonant so we simply need to look for

[bcdfghjklmnpqrstvwxyz]uk

Finally, we can use \b to check for the end of the word, so we're interested in matching

[bcdfghjklmnpqrstvwxyz]uk\b

Now, let's use this in a substitution.

s/([bcdfghjklmnpqrstvwxyz])uk\b/$1ok/g

or

s/(?<=[bcdfghjklmnpqrstvwxyz])uk\b/ok/g

or

s/[bcdfghjklmnpqrstvwxyz]\Kuk\b/ok/g

The last one is the most efficient, but it requires Perl 5.10+. (That shouldn't be a problem given how ancient it is.)

ikegami
  • 367,544
  • 15
  • 269
  • 518
0

Change your regex to:

s/(.)uk/$1ok/g;
Vesa Karjalainen
  • 1,087
  • 8
  • 15
  • Cool, works marvelously. Thank you so much! Can you explain a bit on what it's doing? Is it storing the character to a question mark and then we access that question mark with $1? Please help me on the proper terms. Haha – Asad Dec 29 '19 at 16:18
  • Fixed the obvious mistake :) – Vesa Karjalainen Dec 29 '19 at 17:28
0

As ikegami raised, the word "bukuk" would have two substitutions. This is not the desired outcome as only the last syllable should be changed. Also, I forgot to mention that the change should only be done for a random consonant, u, and followed by k (e.g. ruk, not auk).

As such, taking everything into account that has been answered, the correct regex should be:

s/(\w*[bcdfghjklmnpqrstvwxyz])uk\b/$1ok/g;

EDIT: As ikegami has raised again, the complement of vowels - [^aeiou] will match for other characters like "-" and " " which is undesired. Updated the solution.

Asad
  • 21
  • 4
  • This is based on Malay language rules. Sorry I didn't specify about the last syllable rule in the question. I'll edit it in now. – Asad Dec 30 '19 at 14:03
  • Re "*Sorry I didn't specify about the last syllable rule in the question.*", That's not what I was saying; I was saying you didn't check of the match was in the last syllable. But that's wrong. I had overlooked the `\b`. Sorry, downvote and comment removed. (As well as the request to fix the question, which is now moot) – ikegami Dec 30 '19 at 20:41
  • Ah right, because I'm using complement of the vowels. Will update the regex then. – Asad Jan 06 '20 at 08:43
  • All right now that I understand the nuances a bit better, I've decided to just accept your answer to the question. Thank you so much for your input! – Asad Jan 06 '20 at 13:17