Translate accented to unaccented characters in Sublime Text snippet using regex

Question

I'm writing a ST3 snippet that inserts a \subsection{} with a label. The label is created by converting the header text to conform with the LaTeX standards for labels using a (rather lengthy) regular expression:

${1/(?:([ \t_]+)?|\b)(?:([ÅÄÆÁÀÃ])?|\b)(?:([åäæâàáã])?|\b)(?:([ÉÈÊË])?|\b)(?:([éèëê])?|\b)(?:([ÌÌÎÏ])?|\b)(?:([íìïî])?|\b)(?:([Ñ])?|\b)(?:([ñ])?|\b)(?:([ÖØÓÒÔÖÕ])?|\b)(?:([öøóòôõ])?|\b)(?:([ÜÛÚÙ])?|\b)(?:([üûúù])?|\b)/(?1:-)(?2:A)(?3:a)(?4:E)(?5:e)(?6:I)(?7:i)(?8:N)(?9:n)(?10O)(?11:o)(?12:U)(?13:u)/g}

Actually, I would like for it to be even longer. But if I add the extra groups that I would like, then ST3 crashes when I execute the snippet.

${1/(?:([ \t_]+)?|\b)(?:([ÅÄÆÁÀÃ])?|\b)(?:([åäæâàáã])?|\b)(?:([Ç])?|\b)(?:([ç])?|\b)(?:([ÉÈÊË])?|\b)(?:([éèëê])?|\b)(?:([ÌÌÎÏ])?|\b)(?:([íìïî])?|\b)(?:([Ñ])?|\b)(?:([ñ])?|\b)(?:([ÖØÓÒÔÖÕ])?|\b)(?:([öøóòôõ])?|\b)(?:([ÜÛÚÙ])?|\b)(?:([üûúù])?|\b)(?:([Ý])?|\b)(?:([ÿý])?|\b)/(?1:-)(?2:A)(?3:a)(?4:C)(?5:c)(?6:E)(?7:e)(?8:I)(?9:i)(?10:O)(?11:o)(?12:N)(?13:n)(?14:U)(?15:u)(?16:Y)(?17:y)/g}

Is there any more efficient way of doing this? Preferably one that won't cause ST3 to crash ;)

Edit: Here are some example strings:

Flygande bæckasiner søka hwila på mjuka tuvor
Åke Staël hade en överflödig idé

And the results (with the current, working regex):

Flygande-backasiner-soka-hwila-pa-mjuka-tuvor
Ake-Stael-hade-en-overflodig-ide

But I would like to also replace the characters (ÇçÝÿý) with their unaccented counterparts (CcYyy) so that e.g.

Comment ça va

becomes

Comment-ca-va

`(?:([ÅÄÆÁÀÃ])?|\b)` is an always true pattern because the first alternative can be empty, remove the useless `?` quantifier. Give several example strings and expected results. — Casimir et Hippolyte, Mar 16 '15 at 21:07

Casimir et Hippolyte · Accepted Answer · 2015-03-16T22:25:06.430

I don't know this syntax, but I suspect that the problem comes from the too many optional groups combined with a lot of alternatives that cause a too complex processing.

So you can try to design your pattern like this, and you can add other groups of letters in the same way (take a look at the unicode table to find character ranges):

${1/([ \t_]+)|([À-Å])|([à-å])|([È-Ë])|([è-ë])|([Ì-Ï])|([ì-ï])|([Ò-ÖØ])|([ò-öø])|([Ù-Ü])|([ù-ü])|(Æ)|(æ)|(Œ)|(œ)|(Ñ)|(ñ)/(?1:-)(?2:A)(?3:a)(?4:E)(?5:e)(?6:I)(?7:i)(?8:O)(?9:o)(?10:U)(?11:u)(?12:AE)(?13:ae)(?14:OE)(?15:oe)(?16:N)(?17:n)/g}

if the lookahead feature is available you can improve this pattern to prevent non-accented characters to be tested with each alternatives:

${1/(?=[ \t_À-ÆÈ-ÏÑ-ÖØ-Üà-æè-ïñ-öø-üŒœ])(?:([ \t_]+)|([À-Å])|([à-å])|([È-Ë])|([è-ë])|([Ì-Ï])|([ì-ï])|([Ò-ÖØ])|([ò-öø])|([Ù-Ü])|([ù-ü])|(Æ)|(æ)|(Œ)|(œ)|(Ñ)|(ñ))/(?1:-)(?2:A)(?3:a)(?4:E)(?5:e)(?6:I)(?7:i)(?8:O)(?9:o)(?10:U)(?11:u)(?12:AE)(?13:ae)(?14:OE)(?15:oe)(?16:N)(?17:n)/g}

Note: Æ (Aelig) must be transliterated as AE (the same for Œ => OE)

Translate accented to unaccented characters in Sublime Text snippet using regex

1 Answers1