0
testString = ("<h2>Tricks</h2>"
              "<a href=\"#\"><i class=\"icon-envelope\"></i></a>")
import re
re.sub("(?<=[<h2>(.+?)</h2>\s+])<a href=\"#\"><i class=\"icon-(.+?)\"></i></a>", "{{ \\1 @ \\2 }}", testString)

This produces: invalid group reference.

Making the replacement take only \\1, only extracts envelope, that makes me think that the lookbehind is ignored. Is there a way to extract something from lookbehind?

I'm looking forward to produce:

<h2>Tricks</h2>
{{ Tricks @ envelope }}
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
tomsseisums
  • 13,168
  • 19
  • 83
  • 145
  • You created a character class (a set of characters that is allowed to match) consisting of `<`, `h`, `2`, `>`, etc. there.. Don't use `[..]` unless you want to create a set of characters for a match (`\s`, `\d`, etc. are pre-built character classes). – Martijn Pieters Feb 06 '13 at 15:16
  • 3
    Looks like you *really* want to use a HTML parser instead. Mixing Regular expressions and HTML get's real painful, really really fast. – Martijn Pieters Feb 06 '13 at 15:18
  • I am trying to write a complex F&R for Sublime Text editor, to replace some of the stuff within my files. And, without that `[..]`, `.search` found nothing. – tomsseisums Feb 06 '13 at 15:22
  • Without the character class, the lookbehind is not allowed because you are not allowed to use variable-width patterns in a lookbehind (no `+` or `*`). *with* the character class the lookbehind no longer matches what you think it matches. – Martijn Pieters Feb 06 '13 at 15:23
  • 1
    @psycketom ST2 isn't stopping you from using an HTML library if it's more suited to your purposes for this F&R :) (of course, you could look at the `regex` library, which supports variable length look ahead/behind assertions) – Jon Clements Feb 06 '13 at 15:23
  • @JonClements Could you point me into a HTML library direction? Have never seen such plugin before. – tomsseisums Feb 06 '13 at 15:30
  • 1
    Look at http://www.crummy.com/software/BeautifulSoup/ - have a play with that – Jon Clements Feb 06 '13 at 15:32
  • @MartijnPieters, mind adding an answer so I can accept? – tomsseisums Feb 06 '13 at 16:00
  • @psycketom: There you go, expanded into an answer. – Martijn Pieters Feb 06 '13 at 16:05

1 Answers1

1

Looks like you really want to use a HTML parser instead. Mixing Regular expressions and HTML get's real painful, really really fast.

In your regular expression, you created a character class (a set of characters that is allowed to match) consisting of <, h, 2, >, etc. here:

[<h2>(.+?)</h2>\s+]

which could have been written as:

[<>h2()+.?/\s]

and it would match the same characters.

Don't use [..] unless you want to create a set of characters for a match (\s, \d, etc. are pre-built character classes).

However, even if you were to remove the brackets, the lookbehind is not allowed. You are not allowed to use variable-width patterns in a lookbehind (no + or *). So, with the character class the lookbehind no longer matches what you think it matches, without it the lookbehind is not permissable.

All in all, just just BeautifulSoup instead.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343