3

(Sorry for my broken English)
What I'm trying to do is matching a word (with or without numbers and special characters) or whitespace characters (whitespaces, tabs, optional new lines) in a string in Lua. For example:

local my_string = "foo bar"
my_string:match(regex)    --> should return 'foo', ' ', 'bar'

my_string = "   123!@."     -- note: three whitespaces before '123!@.'
my_string:match(regex)    --> should return ' ', ' ', ' ', '123!@.'

Where regex is the Lua regular expression pattern I'm asking for. Of course I've done some research on Google, but I couldn't find anything useful. What I've got so far is [%s%S]+ and [%s+%S+] but it doesn't seem to work.

Any solution using the standart library, e.g. string.find, string.gmatch etc. is OK.

nickkoro
  • 374
  • 3
  • 15
  • From description it seems that the pattern should be separated by a single space, yet the examples you've tried all match one or more. – Dimitry May 21 '17 at 23:01
  • @Dimitry My title mentions ´a word or whitespaces´, where ´whitespaces´ is in plural, so multiple spaces **or** one word. Same in the description. As I said, my English is far not the best so please correct me if I'm wrong. – nickkoro May 22 '17 at 15:24

1 Answers1

1

Match returns either captures or the whole match, your patterns do not define those. [%s%S]+ matches "(space or not space) multiple times more than once", basically - everything. [%s+%S+] is plain wrong, the character class [ ] is a set of single character members, it does not treat sequences of characters in any other way ("[cat]" matches "c" or "a"), nor it cares about +. The [%s+%S+] is probably "(a space or plus or not space or plus) single character"

The first example 'foo', ' ', 'bar' could be solved by:

regex="(%S+)(%s)(%S+)"

If you want a variable number of captures you are going to need the gmatch iterator:

local capt={}
for q,w,e in my_string:gmatch("(%s*)(%S+)(%s*)") do
  if q and #q>0 then
    table.insert(capt,q)
  end
  table.insert(capt,w)
  if e and #e>0 then
    table.insert(capt,e)
  end
end

This will not however detect the leading spaces or discern between a single space and several, you'll need to add those checks to the match result processing.

Lua standard patterns are simplistic, if you are going to need more intricate matching, you might want to have a look at lua lpeg library.

Dimitry
  • 2,204
  • 1
  • 16
  • 24
  • Thank you for your answer. My understanding of Lua patterns is very in limited, so thank you for the explanation, too. The `gmatch` version worked for my case. – nickkoro May 22 '17 at 15:20