2

I have the following:

itemid=44'>Red Flower</a>

I need it to be this:

_ITEMID_START_44_ITEMID_END_

Can this be done with regular expressions? I need to keep the id (44 in the example), and replace everything on the left with _ITEMID_START_and everything on the right with _ITEMID_END_.

Note: The itemid is one digit or two but never no more than two.

I found something about tagged regular expressions and backreferences which seems like it would work but the syntax is killing me.

I tried this (and other attempts):

Find What: ^(\bitemid=\b)^([0-9][0-9]^)\b'>\b[a-z]+\b</a>\b)
Replace With: ^(\b_ITEMID_START_\b^2^(\b_ITEMID_END_\b

I am using UltraEdit to do the find and replace in over 20,000 *.html files. Any help would be very much appreciated.

Mark
  • 647
  • 2
  • 9
  • 27

3 Answers3

0

The below regex would match everything and capture only the digits which was just after to the itemid=. And in the replacement part, the whole line is replaced with _ITEMID_START_\1_ITEMID_END_ (\1 represents the first captured group. It may vary for different languages)

.*(?<=\bitemid=)([0-9]{1,2}).*

And the substitution would be,

_ITEMID_START_\1_ITEMID_END_

DEMO


If you just want to replace only,

itemid=44'>Red Flower</a>

with

_ITEMID_START_44_ITEMID_END_

Then your regex would be,

\bitemid=([0-9]{1,2})\'>[^<]*<\/a>

And the substitution would be,

_ITEMID_START_\1_ITEMID_END_
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

You can try this:

Find What:    \bitemid=([0-9][0-9]?)'>[^<]*</a>
Replace With: _ITEMID_START_\1_ITEMID_END_

A replacement string is a normal string, and all the regex special characters (except for the backreference) loose their special meaning.

\b the word boundary is the limit between a character that come from the \w character class (a shortcut for [A-Za-z0-9_]) and an other character.

Note: I can't try it with ultraedit, if you obtain a literal \1, replace it with $1

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
0

The solution of Casimir et Hippolyte and also first solution of Avinash Raj work both in UltraEdit with selecting Perl as regular expression engine. The second search string of Avinash Raj requires removing backslash left of character ' in search string to work in UltraEdit.

UltraEdit has 3 regular expression engines: UltraEdit, Unix and Perl.

The search string in the question is a mixture of UltraEdit and Perl regular expression syntax and therefore does not work.

With UltraEdit reguar expression engine:

Find what: itemid=^([0-9]+^)*</a>
Replace with: _ITEMID_START_^1_ITEMID_END_

With Unix or Perl regular expression engine:

Find what: itemid=([0-9]+).*</a>
Replace with: _ITEMID_START_\1_ITEMID_END_

More secure because non greedy, but only with Perl regex engine:

Find what: itemid=(\d+).*?</a>
Replace with: _ITEMID_START_\1_ITEMID_END_

IDM published the power tips tagged expressions for UltraEdit regex engine and Perl regular expressions: Backreferences for Perl regex engine.

Community
  • 1
  • 1
Mofi
  • 46,139
  • 17
  • 80
  • 143
  • Thank you so much. The three choices in UE confused me as I don't thoroughly understand basic regular expressions, more less having a choice of three :) Your example using UE's regular expression worked great. – Mark Jul 02 '14 at 14:19