1

I need a regex to use in string.gmatch that matches sequences of alphanumeric characters and non alphanumeric characters (quotes, brackets, colons and the like) as separated, single, matches, so basically:

str = [[
    function test(arg1, arg2) {
        dosomething(0x12f, "String");
    }
]]

for token in str:gmatch(regex) do
    print(token)
end

Should print:

function
test
(
arg1
,
arg2
)
{
dosomething
(
0x121f
,
"
String
"
)
;
}

How can I achieve this? In standard regex I've found that ([a-zA-Z0-9]+)|([\{\}\(\)\";,]) works for me but I'm not sure on how to translate this to Lua's regex.

user6245072
  • 2,051
  • 21
  • 34

2 Answers2

1

You need a workaround involving a temporary char that is not used in your code. E.g., use a § to insert it after the alphanumeric and non-alphanumeric characters:

str = str:gsub("%s*(%w+)%s*", "%1§") -- Trim chunks of 1+ alphanumeric characters and add a temp char after them
str = str:gsub("(%W)%s*", "%1§")     -- Right trim the non-alphanumeric char one by one and add the temp char after each
for token in str:gmatch("[^§]+") do  -- Match chunks of chars other than the temp char
    print(token)
end

See this Lua demo

Note that %w in Lua is an equivalent of JS [a-zA-Z0-9], as it does not match an underscore, _.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1
local str = [[
    function test(arg1, arg2) {
        dosomething(0x12f, "String");
    }
]]

for p, w in str:gmatch"(%p?)(%w*)" do
   if p ~= "" then print(p) end
   if w ~= "" then print(w) end
end
Egor Skriptunoff
  • 23,359
  • 2
  • 34
  • 64