Regex to find a lowercase letter followed by an uppercase between a HTML tag

Question

I want to use Regular Expression in TextWrangler to find lowercase letter followed by uppercase between these HTML font-color tags. For example:

<font color =#0B610B> Word word wordWord </font>
<font color =#C0C0C0> Word word wordWord </font>

In fact, I want them to be split by a colon as:

<font color =#0B610B> Word word word: Word </font>
<font color =#C0C0C0> Word word word: Word </font>

I have used:

<font color =#0B610B\b[^>]*>(.*?)</font>

But its finds every thing between the font-color tag

I have also tried:

<font color =#0B610B\b[^>]*>([a-z])([A-Z])</font>

But it does not work.

Could anyone help me? Thank you very much.

Might this be a duplicate of [this question](http://stackoverflow.com/questions/8775419/find-lowercase-immediately-followed-by-uppercase)? The context is a bit different, but effectively the same problem. — Tim Post, Jan 09 '12 at 08:22

score 0 · Answer 1 · answered Aug 28 '12 at 22:34

This question has not been marked as Answered. If you still have not found an adequate answer, you can try this:

Given the following examples, only lines 1, 2, and 3 should "match" your criteria. Line 4 should NOT match, since there is no "lowercase-Uppercase" combination. Line 5 should also not match because the font color (#FFFFFF) does not match what you specified (in the OP as well as follow-up comments).

<font color =#0B610B> Word word wordWord </font>
<font color =#C0C0C0> Word word wordWord </font>
<font color =#C0C0C0> wordWord wordWordwordWord </font>
<font color =#0B610B> word word word Word Word Word Wordword </font>
<font color =#FFFFFF> Word word wordWord </font>

The search term could be written like this:

(?<=font color =#(?:0B610B|C0C0C0)>)((?:(?!</font>|[\r\n]).)*[a-z])([A-Z])

The replacement term could be written like this:

\1: \2

The search term has several nested parentheses. The first, (?<...) finds the "" tag on the left, and then starts the search from the right side of it. The (?:0B610B|C0C0C0) finds either of your specified font colors (you can add more by adding more "|" pipes), and does not store them in one of the \# registers (like \1 or \2).

There are then 3 opening ('s. The first is a matching group, which will be matched with the \1. The third (skipping the 2nd for now) that looks like (?!...) will look that the characters just to the right of the current search pattern is NOT the closing  tag, nor is it any kind of newline character. While that condition is true, the . character advances the search to the next character, where it checks again to ensure that the  is not found. It does this until it finds the  closing tag.

The reason for the 2nd (?:...) group is that we don't want that search result to be passed into any registers: we want the "everything between ... tags", but actually excluding the tags.

Finally, in the replacement term, we paste the portion of the text from the right of the tag, to the first occurrence of where the word is lowercase and before the same word hits an Uppercase character. Then it simply enters a colon, a space, and ends. You may have to run this replacement multiple times for cases where a single line contain wordWordWordWord.

jaypal singh · Answer 2 · 2012-01-07T08:53:00.413

0

How about doing a positive look ahead, something like this

[a-z](?=[A-Z])

I don't have text wrangler but you can use this and match the word and add your colon and space. I tested this regex in perl and it looks ok.

[jaypal:~/Temp] cat temp
<font color =#0B610B> Word word wordWord </font>
<font color =#C0C0C0> Word word wordWord </font>

[jaypal:~/Temp] perl -pe 's/([a-z])(?=[A-Z])/$1: /' temp
<font color =#0B610B> Word word word: Word </font>
<font color =#C0C0C0> Word word word: Word </font>

Update: I forgot I have BBEdit which is the big brother of Text Wrangler. Here is it in action.

Update2: Here is it in action in Text Wrangler.

edited Jan 07 '12 at 08:53

answered Jan 07 '12 at 08:27

jaypal singh

74,723
23
102
147

Nope, seems not in TextWtangler. I tried ([a-z])(?=[A-Z]) but it did not run. Thanks. – Niamh Doyle Jan 07 '12 at 08:48
It works in Text wrangler. See my second snap shot. Make sure you have `grep`, `case-sensitive` and `wrap around` selected. – jaypal singh Jan 07 '12 at 08:53
Many thanks, Jaypal. I forgot to let you know that there are some exceptions (such as kW, PhD...). But more importantly, I only want to edit what occur between the two specific font-color tags ( and ). – Niamh Doyle Jan 07 '12 at 08:55

score 0 · Answer 3 · edited Jan 07 '12 at 11:59

0

try this

<font.*?>.*?[a-z][A-Z].*?

edited Jan 07 '12 at 11:59

menjaraz

7,551
4
41
81

answered Jan 07 '12 at 08:28

shift66

11,760
13
50
83

Thanks a lot but this does not work, either. There are other font tags in the text and it select every occurrences. I try .*?[a-z][A-Z].*? but it does not work. – Niamh Doyle Jan 07 '12 at 08:46

score 0 · Answer 4 · answered Jan 07 '12 at 11:40

0

How about this one:

<font[^>]*>[^<>]*([a-z][A-Z])[^<>]*</font>

answered Jan 07 '12 at 11:40

ynka

1,457
1
11
27

Thanks, but this does not find any occurrences. – Niamh Doyle Jan 07 '12 at 14:15
It must be a textwrangler specific thing then - I checked this in in Notepad++ and in Java with your examples. Sorry I couldn't help. – ynka Jan 07 '12 at 20:11

score 0 · Answer 5 · answered Jan 13 '12 at 19:12

I don't think you can do it in one single Regex expression, but provided you can loop through it:

<script type="text/javascript">
function checkscript() {
    var content = document.regexForm.input.value;
//match any HTML tag (you could specify font)(not an opening tag)(lowercase)(uppercase)(not an opening tag)
    while(content.match(/(<[^>]*?>)([^<]*)([a-z])([A-Z])([^<]*)/))
    {
        content = content.replace(/(<[^>]*?>)([^<]*)([a-z])([A-Z])([^<]*)/g,"$1$2$3: $4$5");
    }
    document.regexForm.output.value = content;
}
</script>
<body>

<form name="regexForm">
    <textarea rows="10" cols="50" name="input"> 
            <font color =#0B610B> Word myWord<BR> wordWord </font>
            <font color =#C0C0C0> Word word wordWord </font>
    </textarea>
<BR>    
<input type=button value="run test regex" onClick="checkscript();return true;">
<BR><textarea rows="10" cols="50" name="output"></textarea>
</form>

this:

<font color =#0B610B> Word myWord<BR> wordWord </font>
<font color =#C0C0C0> Word word wordWord </font>

becomes:

<font color =#0B610B> Word my: Word<BR> word: Word </font>
<font color =#C0C0C0> Word word word: Word </font>

Regex to find a lowercase letter followed by an uppercase between a HTML tag

5 Answers5

Update: I forgot I have BBEdit which is the big brother of Text Wrangler. Here is it in action.

Update2: Here is it in action in Text Wrangler.