10

I have both the "5.1 Reference Manual" and the "Programming in Lua: 3rd Ed." in front of me. Reading these, as well as numerous searches on the web, still leave me a bit confused when it comes to using string.match and string.gmatch.

I understand that they both are used to locate patterns.

Here are an example they use in the "Reference Manual" for string.gmatch:

s = "hello world from Lua"
for w in string.gmatch (s, "%a+") do
    print(w)
end

I understand that this will iterate over all of the words in s and print them each one per line.

Here is an example they use in the "Programming in Lua" book for string.match:

date = "Today is 17/7/1990"
d = string.match(date, "%d+/%d+/%d+")
print(d) -- prints 17/7/1990

What I'm confused about is when is it appropriate to use one over the other?

For example, you had code you wanted to parse that had contained the same pattern dozens of times throughout it. This pattern contained variables you needed, which would be a better choice? Example code below (x's are all variable data that differs from the other lines. data can be any garbage you didn't care about looking for and it was all just noise):

Header contains variable (HERE) and (HERE) I want.  
    data data data data data data data data 
    <Font Typeset:xxxx Font Color:xxx Font Xpos:xxx Font Ypos:xxx Font Bold:X Font Uline:X Font Italic:X Font Text:XXXXXXXXX>
    data data data data data data data
    <Font Typeset:xxxx Font Color:xxx Font Xpos:xxx Font Ypos:xxx Font Bold:X Font Uline:X Font Italic:X Font Text:XXXXXXXXX>
    <Font Typeset:xxxx Font Color:xxx Font Xpos:xxx Font Ypos:xxx Font Bold:X Font Uline:X Font Italic:X Font Text:XXXXXXXXX>
    <Font Typeset:xxxx Font Color:xxx Font Xpos:xxx Font Ypos:xxx Font Bold:X Font Uline:X Font Italic:X Font Text:XXXXXXXXX>
    data data data data data data data data data 
    <Font Typeset:xxxx Font Color:xxx Font Xpos:xxx Font Ypos:xxx Font Bold:X Font Uline:X Font Italic:X Font Text:XXXXXXXXX>
    <Font Typeset:xxxx Font Color:xxx Font Xpos:xxx Font Ypos:xxx Font Bold:X Font Uline:X Font Italic:X Font Text:XXXXXXXXX>
    data data data data data data data data data data data data data data data data data data data data data data data data 
    <Font Typeset:xxxx Font Color:xxx Font Xpos:xxx Font Ypos:xxx Font Bold:X Font Uline:X Font Italic:X Font Text:XXXXXXXXX>
Footer here also has three variables I want (here)/(here) and (here)

This code obviously has a pattern to it. But, if wanted to create a simple function that parsed the data and grabbed the variables, which is the better choice?

function match(data)
    local f_type, f_color, f_xpos, f_ypos, f_bold, f_uline, f_italic, f_txt = data:match("<Font Typeset:(.-) Font Color:(.-) Font Xpos:(.-) Font Ypos:(.-) Font Bold:(.-) Font Uline:(.-) Font Italic:(.-) Font Text:(.-)>
    print(f_type, f_color, f_xpos, f_ypos, f_bold, f_uline, f_italic, f_txt)
end

...or...

function gmatch(data)
    local f_type, f_color, f_xpos, f_ypos, f_bold, f_uline, f_italic, f_txt = data:gmatch("<Font Typeset:(.-) Font Color:(.-) Font Xpos:(.-) Font Ypos:(.-) Font Bold:(.-) Font Uline:(.-) Font Italic:(.-) Font Text:(.-)>
    print(f_type, f_color, f_xpos, f_ypos, f_bold, f_uline, f_italic, f_txt)
end
  1. Does gmatch just iterate over the entire code (data in this example) and return all instances where the pattern is true where match only does the first?

  2. In what scenarios is one better than the other?

ETA: I added a header and footer to the example code. This header and footer both contain variables I want to use. Now this entire chunk of code (header/body/footer) are repeated numerous times throughout the same input file that I want to parse. So there are patterns within patterns.

Pwrcdr87
  • 935
  • 3
  • 16
  • 36
  • That `gmatch` example does not work. `gmatch` returns an iterator not all the matches at once. That's why the earlier example uses it in a loop. That's the main key, do you need all the matches now (can you match them all that way) or do you want to loop over the individual matches as they are found. – Etan Reisner Feb 18 '15 at 20:41
  • @EtanReisner The data would contain a header and a footer that have var's in it. The body also has a pattern that need to be concatenated together to complete a finished string. I'll update my example code above to show what I mean. Sorry for not doing so initially. – Pwrcdr87 Feb 18 '15 at 20:52

1 Answers1

16
  1. Does gmatch just iterate over the entire code (data in this example) and return all instances where the pattern is true where match only does the first?

    It returns an iterator for doing so.

    Returns an iterator function that, each time it is called, returns the next captures from pattern (see §6.4.1) over the string s. If pattern specifies no captures, then the whole match is produced in each call.

  2. In what scenarios is one better than the other?

    • string.gsub is best when you need to substitute the matches without regard to their position.
    • string.gmatch is best when you are only interested in the matches, and want to iterate over them.
    • string.match gives you all the captures from the first match.
    • string.find is the most versatile, returning the first match and its position. The cost is not being specialized for any task, thus needing more code.
Deduplicator
  • 44,692
  • 7
  • 66
  • 118
  • Just to clarify: If my file was comprised of a pattern of a header/body/footer, and this overall pattern was repeated 4x; I'd be best to use the string.match to grab each one separately resulting in a new sub-string of a header/body/footer. Then with each newly grabbed sub-string I would parse the sub-pattern with string.gmatch since that is an iterator so I can iterate through the body for the variable grabs? – Pwrcdr87 Feb 18 '15 at 21:15
  • That is one possibility. You could also iterate with `gmatch` and grab a header+body+footer tripplet each iteration. – Deduplicator Feb 18 '15 at 21:31