4

I have a string which can contain any number of the delimiter §\n. I would like to remove all delimiters from a string, except the last occurrence which should be left as-is. The last delimiter can be in three states: \n, §\n or §§\n. There will never be any characters after the last variable delimiter.

Here are 3 examples with the different state delimiters:

abc§\ndef§\nghi\n
abc§\ndef§\nghi§\n
abc§\ndef§\nghi§§\n

I would like to remove all delimiters except the last occurrence.

So the result of gsub for the three examples above should be:

abcdefghi\n
abcdefghi§\n
abcdefghi§§\n

Using regular expressions, one could use §\\n(?=.), which matches properly for all three cases using positive lookahead, as there will never be any characters after the last variable delimiter.

I know I could check if the string has the delimiter at the end, and then after a substitution using the Lua pattern §\n I could add the delimiter back onto the string. That is however a very inelegant solution to a problem which should be possible to solve using a Lua pattern alone.

So how could this be done using a Lua pattern?

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
  • what is the unicode / name of the character you use as a delimiter? – Devon Parsons Jun 19 '14 at 13:01
  • The string "§\n" (section sign followed by a line feed) is the delimiter. Unicode: U+00A7 followed by U+000A. –  Jun 19 '14 at 13:05
  • I would probably just capture the delimiter at the end first, gsub them all out, and then append the captured delimiter back on the end. – Etan Reisner Jun 19 '14 at 13:11
  • I am currently doing that in my production code but I would like to only use gsub without appending the delimiter afterwards. It's not a *bad* solution but I'm sure it could be done better with gsub alone, which is why I asked this question. –  Jun 19 '14 at 13:45

2 Answers2

3

str:gsub( '§\\n(.)', '%1' ) should do what you want. This deletes the delimiter given that it is followed by another character, putting this character back into to string.

Test code

local str = {
    'abc§\\ndef§\\nghi\\n',
    'abc§\\ndef§\\nghi§\\n',
    'abc§\\ndef§\\nghi§§\\n',
}

for i = 1, #str do
    print( ( str[ i ]:gsub( '§\\n(.)', '%1' ) ) )
end

yields

abcdefghi\n
abcdefghi§\n
abcdefghi§§\n
mkluwe
  • 3,823
  • 2
  • 28
  • 45
-2

EDIT: This answer doesn't work specifically for lua, but if you have a similar problem and are not constrained to lua you might be able to use it.

So if I understand correctly, you want a regex replace to make the first example look like the second. This:

/(.*?)§\\n(?=.*\\n)/g

will eliminate the non-last delimiters when replaced with

$1

in PCRE, at least. I'm not sure what flavor Lua follows, but you can see the example in action here.

REGEX:
/(.*?)§\\n(?=.*\\n)/g

TEST STRING:
abc§\ndef§\nghi\n abc§\ndef§\nghi§\n abc§\ndef§\nghi§§\n

SUBSTITUTION:
$1

RESULT:
abcdefghi\n abcdefghi§\n abcdefghi§§\n

Devon Parsons
  • 1,234
  • 14
  • 23
  • Ah, I see. I've never used lua before. I'll keep the answer here in case it does come in useful for someone else, though. – Devon Parsons Jun 19 '14 at 13:27