0

I would like to understand how I could lpeg to replace strings if they are NOT between a certain start and end delimiter. Below is an example, where I would like to use SKIPstart and SKIPstop to signify where text shouldn't be replaced.

rep
rep
SKIPstart
rep
rep
SKIPstop
rep
rep

to

new
new
SKIPstart
rep
rep
SKIPstop
new
new

Here would be another example with multiple delimiters:

rep
rep
SKIPstart
rep
rep
SKIPstop
rep
rep
SKIPstart
rep
rep
SKIPstop

to

new
new
SKIPstart
rep
rep
SKIPstop
new
new
SKIPstart
rep
rep
SKIPstop

and nested

rep
rep
SKIPstart
rep
SKIPstart
rep
SKIPstop
rep
SKIPstop
rep
rep

to

new
new
SKIPstart
rep
SKIPstart
rep
SKIPstop
rep
SKIPstop
new
new
likethevegetable
  • 264
  • 1
  • 4
  • 17

2 Answers2

1

Sorry, I don't know lpeg, but your task is easily solvable with usual Lua patterns.
IMO, lpeg or other external regex libraries are overkill in most cases, Lua patterns are surprisingly good enough.

local s = [[
rep
rep
SKIPstart
rep
rep
SKIPstop
rep
rep
SKIPstart
rep
SKIPstart
rep
SKIPstop
rep
SKIPstop
rep
rep
]]
s = s:gsub("SKIPstart", "\1%0")
     :gsub("SKIPstop", "%0\2")
     :gsub("%b\1\2", "\0%0\0")
     :gsub("(%Z*)%z?(%Z*)%z?",
         function(a, b) return a:gsub("rep", "new")..b:gsub("[\1\2]", "") end)
print(s)

Output:

new
new
SKIPstart
rep
rep
SKIPstop
new
new
SKIPstart
rep
SKIPstart
rep
SKIPstop
rep
SKIPstop
new
new
Egor Skriptunoff
  • 23,359
  • 2
  • 34
  • 64
  • Might take a while for me to digest this.. but would it work with multiple delimiter? How about nested? What does does the `\1%0` mean? Just a little internal flag you used to mask the delimiter? – likethevegetable Dec 18 '21 at 02:37
  • `would it work with multiple delimiter?` - Please add example of multiple delimiters to the question. – Egor Skriptunoff Dec 18 '21 at 02:38
  • `Just a little internal flag you used to mask the delimiter?` - Yes, that's a workaround for impossibility to write `(abc)?` in Lua patterns – Egor Skriptunoff Dec 18 '21 at 02:40
  • I added some more examples. – likethevegetable Dec 18 '21 at 02:44
  • `would it work with multiple delimiter? How about nested?` - Yes. Yes. – Egor Skriptunoff Dec 18 '21 at 02:52
  • I think I'm on the way to understanding it... can you please explain it though? – likethevegetable Dec 18 '21 at 03:11
  • Which line of code needs explanation? – Egor Skriptunoff Dec 18 '21 at 03:31
  • The first two `gsub`s are wrapping the delimiters with `\1 ... \2`, then we use those to form a boundary with the third `gsub` and wrap it with `\0`, then I don't get the complicated `"(%Z*)%z?(%Z*)%z?"` part.. I think `%Z` would match `\1|\2|...` but not `\0`. but I don't see how we target captures outside of the delimiters.. or is `%Z` any character but `\0`? – likethevegetable Dec 18 '21 at 03:45
  • I think I get it now. We capture two parts, a and b, which essentially capture every character but `\0` with `(%Z*)`. We don't want to replace anything after the first `\0` (which is b), so in the function, we only gsub on a. Then clean up the `\1` and `\2`. Very clever. Thanks! – likethevegetable Dec 18 '21 at 04:34
0

Egor Skriptunoff's answer is a great way of playing tricks with standard lua patterns to achieve your goal. I agree that if a straightforward way can work, I won't recommend using LPeg or other external libraries.

As you asked about LPeg, I'll show you how you can do it with LPeg.

local re = require('lpeg.re')

local defs = {
  do_rep = function(p)
    return p:gsub('rep', 'new')
  end
}

local pat = re.compile([=[--lpeg
  all <- {~ ( (!delimited . [^S]*)+ -> do_rep / delimited )* ~}
  delimited <- s (!s !e . / delimited)* e
  s <- 'SKIPstart'
  e <- 'SKIPstop'
]=], defs)

local s = [[
rep
rep
SKIPstart
rep
rep
SKIPstop
rep
rep
SKIPstart
rep
SKIPstart
rep
SKIPstop
rep
SKIPstop
rep
rep
]]

s = pat:match(s)
print(s)
  • Certainly, picking up the trick of using `%z`, which was formerly unknown to me, has already made things much easier. I will take some time to digest this and approve it as the answer. Out of curiosity though, i've seen tables be used in LPEG (pointers to sub-expressions). Why use the re.compile instead of a table? Thanks – likethevegetable Dec 20 '21 at 15:21
  • I personally prefer using LPeg.re instead of bare LPeg, since it's easier to understand and write than plain LPeg. – Brynne Taylor Dec 22 '21 at 02:11