4

I'd like to match the lowercase version of an uppercase character in a backreference in a regex. For example, let's say I want to match a string where the 1st character is any uppercase character and the 4th character is the same letter as the first except it's a lowercase character. If I use grep with this regex:

grep -E "([A-Z])[a-z]{2}\1[a-z]"

it would match "EssEx" and "SusSe" for instance. I'd like to match "Essex" and "Susse" instead. Is it possible to modify the above regular expression to achieve this ?

Manos Nikolaidis
  • 21,608
  • 12
  • 74
  • 82
  • Well, try `"[A-Z][a-z]{4}"` – Wiktor Stribiżew Jan 31 '17 at 18:26
  • 1
    That would also match "Esssx" for example. I only want it to match if it's a lower case version of the same letter that's in the backreference. E.g `a` for `A`, `q` for `Q`. Not any other lowercase character. – Manos Nikolaidis Jan 31 '17 at 18:31
  • 3
    Are inline modifiers supported? If yes, good old `([A-Z])[a-z]{2}(?-i)(?!\1)(?i)\1[a-z]*` should work. – Sebastian Proske Jan 31 '17 at 18:36
  • 1
    @SebastianProske that works as expected if I use `grep -P` which is fine. If you post an answer with this I'll accept it because it uses grep and is slightly simpler/shorter than @anubhava's answer. – Manos Nikolaidis Jan 31 '17 at 18:45

2 Answers2

2

It will be more verbose but this awk does the job:

awk '/([A-Z])[a-z]{2}/ && tolower(substr($1, 1, 1)) == substr($1, 4, 1) && 
     substr($1, 5) ~ /[a-z]/' file

Essex
Susse
Martin
  • 22,212
  • 11
  • 70
  • 132
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 2
    I've not seen you here for ages then, bam, you come along and shine a light for us mere mortals! – Martin Jan 31 '17 at 18:36
2

This is one of the cases where inline modifiers come in handy. Here is a solution that makes use of a case-senstive lookahead to check, that it is not exactly the same (uppercase) character and a case-insensitive backreference to match the fitting lowercase letter:

([A-Z])[a-z]{2}(?-i)(?!\1)(?i)\1[a-z]

Note that the (?-i) most likely isn't needed, but it's there for clarity. Inline modifiers are not supported by all regex flavours. PCRE supports it, so you will have to use -P with grep.

Sebastian Proske
  • 8,255
  • 2
  • 28
  • 37