2

I need to match a linebreak in-between double quotes, as in:

<p class="calibre1">“This is the first sentence.</p>
<p class="calibre1">And this is the second!”</p>

This would match </p> <p class="calibre1">

Now, I got this working with the regex (?<=“[^”]*)</p>\s*<p[^>]*>(?!“) but I get the error described in the title: "Invalid regular expression: look-behind requires fixed-width pattern" when I try to use it non-manually. I need this regex for the eBook management/editing program, Calibre, which uses Python for its regex engine. The regex above works for manually searching a book, but when I try to include the regex as a "common option" (run on each eBook conversion) I get that error.

I don't see how it's possible to do this without a variable width look-behind, since you can't know how long it will be from the left doublequote to the linebreak. Help would be much appreciated!

Zout
  • 821
  • 10
  • 18
  • And before you continue, please consider that HTML is not a regular language so unless you are parsing a minimal subset of the language, try to use something other than regular expressions. – msvalkon May 21 '14 at 11:07
  • 1
    @msvalkon There aren't any other options, since as I said, I'm using Calibre, an eBook editing program. There is no option but to use regular expressions for this situation. – Zout May 21 '14 at 11:10

2 Answers2

2

Python re module, as most languages (with the notable exception of .NET), doesn't support variable length lookbehind.

Can't you use a capturing group instead ?

“[^”]*(</p>\s*<p[^>]*>)

Data in the first capturing group.

Robin
  • 9,415
  • 3
  • 34
  • 45
  • Good idea! It isn't very pretty, but matching `“([^”]*)\s*

    ]*>(?!“)` and replacing with `“\1` seems to work.

    – Zout May 21 '14 at 13:43
  • @Zajora Oh, so that's what you wanted to do? Btw, `

    ]*>(?!“)` means "`` tags not followed *directly* by a `“`". Is that what you want?

    – Robin May 21 '14 at 13:56
  • Yeah. It does seem a bit weird, but I had to add `(?!“)` since some eBooks have the quotes reversed (probably some kind of conversion error).. or at least, I think that was the issue. Now that I'm looking at it I'm starting to wonder.. – Zout May 21 '14 at 14:55
0

Lookbehinds need to be zero-width, thus quantifiers are not allowed.

aelor
  • 10,892
  • 3
  • 32
  • 48