0

In InDesign I was hoping [\l]{4}(?=\s) will find the last four letters of words, but the GREP did not work. I wanted to put it in the header of page as the suffix. Was doing magic with \b and $, nothing worked. And http://regex101.com/r/uQ7xR3/1 does not work in InDesign, because it's php flavour.

Because there are several additional conditions. If the 5th letter is h, then instead of 4 we should take 5 last letters of each word. But we do not take anything separated by an \s, nor do we take ... or anything inside | (like | ā |).

virūpacakṣus dharmacakṣus nayacakṣus sūryacakṣus divyacakṣus saṃgrah āsaṃgrah upasaṃgrah pratisaṃgrah abhisaṃgrah anusaṃgrah

Update. Let me add more limitations. Not just a "h", but if there are these combinations kh|gh|ch|jh|ṭh|ḍh|th|dh|ph|bh, do not take only last 4, but last 5 letters. Same with ai|au - they should not be split.

General case: 1) From vṛddhāpacāyitva take itva. Two exclusions: 2) From nakhāli take khāli instead of just hāli, because kh is treated like a single letter in devanagari script. Identically with kh|gh|ch|jh|ṭh|ḍh|th|dh|ph|bh. From mirikha take rikha instead of just ikha, because kh is treated like a single letter in devanagari script. Identically with kh|gh|ch|jh|ṭh|ḍh|th|dh|ph|bh. 3) From mahahrauḍ take hrauḍ instead of just rauḍ, because au is treated like a single letter in devanagari script, so ai|au is like a single letter. From ekaikaivat take aivat instead of just ivat, because ai is treated like a single letter in devanagari script, so ai|au is like a single letter.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
gasyoun
  • 23
  • 5

2 Answers2

1

Be careful when stating "it does not work", and the reasoning behind it. Your initial GREP [\l]{4}(?=\s) does work in InDesign (although the [..] are superfluous).

Similar, the linked \w\w\w\w$ also works, and it has nothing to do with "php flavor". The reason only the last occurrence is highlighted is because (1) the $ links to end-of-story only, and adding the m multi-line flag makes it work for individual lines, (2) with m only the first instance will be highlighted (the default) and you need g to get them all, but most importantly, (3) \w in a general GREP parser may not be Unicode-aware, and in this case you can see it isn't because \w does not pick up the and . InDesign's GREP, on the other side, is Unicode-aware.

The following expression will work on the specific examples you supplied; the other "single letter" combinations can possibly be added in a similar way.

(au|ai|kh|\l){4}h?\b

When applied to your sample words:

grep with complications

Jongware
  • 22,200
  • 8
  • 54
  • 100
0

Perhaps try:

[[:alpha:]]{4}h?\b

For your additional qualifications, you can try:

 (?:ai|au|kh|gh|ch|jh|ṭh|ḍh|th|dh|ph|bh|[[:alpha:]]){4}h?\b

Again, as before, you will need to replace the posix class for letters with whatever token is the equivalent in InDesign

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
  • Thanks, it works in http://rubular.com/r/IL3hvsdDzf, but fails in InDesign, must be different GREPs. Does not work as well at http://regexpal.com/ and http://www.online-utility.org/text/grep.jsp and http://www.regexr.com/39ogg – gasyoun Oct 17 '14 at 06:25
  • 1
    I don't know of an online tester for InDesign. So far as the online testers you mention, since they are designed for different flavors, why would you expect the regex to work without appropriate translation? Regexpal is for javascript; I'm not sure of the last link but you can easily get it to work by making an appropriate translation for the [[:alpha:]] token and the anchor. Make the same translation for your InDesign flavor and it should work. First try /S in place of [[:alpha:]]. If that doesn't work, you've got something else wrong with your process. – Ron Rosenfeld Oct 17 '14 at 11:08