2

In a epub code, I have this text:

<span>Capitulo 1 - Apple is red</span>
<span>Capitulo 2 - Milk is white</span>
<span>Capitulo 3 - Weeds are green</span>

I need to replace "span" tags with "h1" tags, and all instances of "capitulo" with "chapter", mantaining the rest of the text. I tried this in calibre, with no fortune:

Find: <span>Capitulo (/d+) * </span>
Replace: <h1>Chapter /1 * </h1>

What can i do?

2nd question: If i had this text:

<span>Capitulo 1 - apple is red, 5 chicas</span>
<span>Capitulo 2 - milk is white, 6 chicas</span>
<span>Capitulo 3 - weeds are green, 7 chicas</span>

and i want to obtain:

<h1>Chapter1 - apple is red, 5 girls</h1>
<h2>Chapter2 - milk is white, 6 boys</h2>
<h3>Chapter3 - weeds are green, 7 men</h3>

how should i proceed?

Luigi P.
  • 47
  • 6
  • `Capitulo ([^<]*)` => `

    Chapter \1

    `
    – Wiktor Stribiżew Apr 16 '19 at 09:49
  • You can try this `Capitulo(\s*\d+.*?)<\/span>` [demo](https://regex101.com/r/ZsQcgv/1/) – Code Maniac Apr 16 '19 at 09:53
  • @WiktorStribiżew Isn't it necessary to escape `<`, `>` and `/`? – Ildar Akhmetov Apr 16 '19 at 09:53
  • @IldarAkhmetov None of these is a special char, so why escape? – Wiktor Stribiżew Apr 16 '19 at 09:54
  • @WiktorStribiżew Yes, correct, although `/` is a special char in PCRE, it's probably not the case in Calibre. – Ildar Akhmetov Apr 16 '19 at 09:56
  • 1
    @IldarAkhmetov `/` is [not special in PCRE](https://regex101.com/r/DggwZ8/1). `/` should only be escaped in regex literals, and text editors only use string patterns. – Wiktor Stribiżew Apr 16 '19 at 09:56
  • @CodeManiac It is not special in JS, it acts as a regex delimiter in regex literal notation and then it must be escaped. `new RegExp('/')` - no escaping necessary as `/` is **not a special regex metacharacter**, period. – Wiktor Stribiżew Apr 16 '19 at 09:58
  • @WiktorStribiżew yes in case of using regeExp with regex literal notation we need to escape it, yeah Metacharacter is more perfect name – Code Maniac Apr 16 '19 at 10:00
  • 1
    @CodeManiac sure, but it does not make `/` "special" in the meaning of a "special regex metacharacter". – Wiktor Stribiżew Apr 16 '19 at 10:00
  • @WiktorStribiżew yeah you're right i meant that only but used wrong name, thanks for pointing out – Code Maniac Apr 16 '19 at 10:01
  • @WiktorStribiżew just out of curosity because your regex looks more cleaner that what i have wrtten, will your regex takes care of input like `Capitulo 1 - Apple is <> red`, – Code Maniac Apr 16 '19 at 10:07
  • @CodeManiac I'd say that your string has an error. Judging by the sample input, there should be no child tags inside the `span`s OP is targeting. `.*?` or `[\s\S]*?` would solve the issue you mention. Unless there might be nested `span` tags, but that is already out of scope here. If there are tags inside these `spans` I would not use a text editor to handle them. – Wiktor Stribiżew Apr 16 '19 at 10:20
  • @WiktorStribiżew inside value is just text and having `<` or `>` is not error i guess ( what i mean is let's say i have some expression inside span like x < y ), yeah thanks for clarification – Code Maniac Apr 16 '19 at 10:22

1 Answers1

2

You may use

Find: <span>Capitulo ([^<]*)</span>
Replace: <h1>Chapter \1</h1>

See the regex demo and the Regulex graph:

enter image description here

The ([^<]*) part matches any 0 or more characters other than < as [^<] is a negated character class and the (...) form a capturing group whose contents are accessible from the replacement pattern via backreferences (see \1 in the replacement).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563