3

I have difficulty using Regular Expression (Grep) in TextWrangler to find occurrences of lowercase letter followed by uppercase. For example:

This announcement meansStudents are welcome.

In fact, I want to split the occurrence by adding a colon so that it becomes means: Students

I have tried:

[a-z][A-Z]

But this expression does not work in TextWrangler.

*EDIT: here are the exact contexts in which the occurrences appear (I mean only with these font colors).*

<font color =#48B700>  - Stột jlăm wẻ baOne hundred and three<br></font>

<font color =#C0C0C0>     »» Qzống pguộc lyời ba yghìm fảy dyổiTo live a life full of vicissitudes, to live a life marked by ups and downs<br></font>

"baOne" and "dyổiTo" must be "ba: One" and "dyổi: To" 

Could anyone help? Many thanks.

Niamh Doyle
  • 1,909
  • 7
  • 30
  • 42

4 Answers4

3

I do believe (don't have TextWrangler at hand though) that you need to search for ([a-z])([A-Z]) and replace it with: \1: \2

Hope this helps.

Igor Korkhov
  • 8,283
  • 1
  • 26
  • 31
  • Nope! It just finds any adjacent letters. – Niamh Doyle Jan 06 '12 at 10:39
  • 1
    Any adjacent letters, even two lowercase ones? Then maybe you need to tick 'Case sensitive' box then? – Igor Korkhov Jan 06 '12 at 10:44
  • That's exactly the problem. Thank you so much! But it now turns to another problem: it finds and replaces all the values, even the unwanted one FileMaker into File: Maker. – Niamh Doyle Jan 06 '12 at 10:52
  • Unfortunately, you haven't described the nature of your text. Of course, the expression I suggested looks for any lowercase letter following any uppercase one, regardless of any context. Maybe if you give us an example of your text we will be able to provide a better solution. – Igor Korkhov Jan 06 '12 at 11:02
  • Still not clear what must be separated by the colon, and what should be left unchanged. – Igor Korkhov Jan 06 '12 at 12:18
  • Well Igor, between each of these two font-color tags, there is one occurrence of lowercase letter followed by uppercase that need separating by the colon. All other occurrences outside these two font-color tags are left unchanged. – Niamh Doyle Jan 06 '12 at 12:34
2

This question is ages old, but I stumbled upon it, so someone else might, as well. The OP's comment to Igor's response clarified how the task was meant to be described (& could have be added to the description).

To match only those font-specific lines of the HTML replace

(?<=<font color =#(?:48B700|C0C0C0)>)(.*?[a-z])([A-Z])

with \1: \2

Explanation:

  • (?<=[fixed-length regex]) is a positive lookbehind and means "if my match has this just before it"
  • (?:48B700|C0C0C0) is an unnamed group to match only 2 colours. Since they are of the same length, they work in a lookbehind (that needs to be of fixed length)
  • (.*?[a-z])([A-Z]) will match everything after the > of those begin font tags up to your Capital letters.
  • The \1: \2 replacement is the same as in Igor's response, only that \1 will match the entire first string that needs separating.

Addition:

Your input strings contain special characters and the part you want to split may very well end in one. In this case they won't be caught by [a-z] alone. You will need to add a character ranger that captures all the letters you care about, something like

(?<=<font color =#(?:48B700|C0C0C0)>)(.*?[a-zḁ-ῼ])([A-Z])

Alex Constantin
  • 519
  • 4
  • 8
2

Replace ([a-z])([A-Z]) with \1:\2 - I don't have TextWrangler, but it works on Notepad++

The parenthesis are for capturing the data, which is referred to using \1 syntax in the replacement string

Amarghosh
  • 58,710
  • 11
  • 92
  • 121
  • Thanks, Amarghosh. But it still does not work. Anyway, my document contains HTML tags and the expression seems to include everything between the font tags. – Niamh Doyle Jan 06 '12 at 10:36
  • Thanks, but still no luck in TextWrangler. I don't have Notepad++ for Mac :( to try. – Niamh Doyle Jan 06 '12 at 10:44
0

That is the correct pattern for identifying lower case and upper case letters, however, you will need to check matching to be Case Sensitive within the Find/Replace dialogue.

Joshua Cook
  • 12,495
  • 2
  • 35
  • 31