16

I need to write the Lua equivalent of the following regular expression (regex)

\b[0-9]*.\b[0-9]*(?!])

for use with Lua's string.gmatch. Can this be done?

For reference, the above expression matches both the integer and fractional part of a number containing a decimal separator (e.g. 1, 1.1, 0.1, 0.11 all match fully). However, if a trailing ] is present, then only the integer part is matched (e.g. only 1. is matched in 1.1]).

Kyle F Hartzenberg
  • 2,567
  • 3
  • 6
  • 24
Goles
  • 11,599
  • 22
  • 79
  • 140
  • By the way, your regular expression does not really match what you want it to match. It matches the empty string, but does not match `.11`. – Paŭlo Ebermann Jun 12 '11 at 16:02
  • Fixed that, it was not really required :) – Goles Jun 13 '11 at 04:06
  • It also does not match `0.11` ... you want a `*` instead of the `?` after your second `[0-9]`, I think. (And maybe one of the `*` be a `+` to avoid matching the empty string.) – Paŭlo Ebermann Jun 13 '11 at 11:13

2 Answers2

18

Lua does not have regular expressions, mainly because a full regular expression library would be bigger than Lua itself.

What Lua has instead are matching patterns, which are way less powerful (but still sufficient for many use cases):

  • There is no "word boundary" matcher,
  • no alternatives,
  • and also no lookahead or similar.

I think there is no Lua pattern which would match every possible occurrence of your string, and no other one, which means that you somehow must work around this.

The pattern proposed by Stuart, %d*%.?%d*, matches all decimal numbers (with or without a dot), but it also matches the empty string, which is not quite useful. %d+%.?%d* matches all decimal numbers with at least one digit before the dot (or without a dot), %d*%d.?%d+ matches all decimal numbers with at least one digit after the dot (or without a dot). %.%d+ matches decimal numbers without a digit before the dot.

A simple solution would be to search more than one of these patterns (for example, both %d+%.?%d* and %.%d+), and combine the results. Then look at the places where you found them and look if there is a ']' following them.


I experimented a bit with the frontier pattern.

The pattern %f[%.%d]%d*%.?%d*%f[^%.%d%]] matches all decimal numbers which are preceded by something that is neither digit nor dot (or by nothing), and followed by something that is neither ] nor digit nor dot (or by nothing). It also matches the single dot, though.

Paŭlo Ebermann
  • 73,284
  • 20
  • 146
  • 210
  • 1
    There's this [frontier pattern](http://lua-users.org/wiki/FrontierPattern) which is "unofficial and undocumented" in 5.1 but will be in Lua 5.2. – jpjacobs May 31 '11 at 19:40
  • I've read about it, but how would the expression look using the aforementioned frontier pattern ? – Goles May 31 '11 at 19:52
  • @jpjacobs: Thanks, this looks nice. I added an example using this, but it still is not quite there. – Paŭlo Ebermann May 31 '11 at 20:05
  • That's a great solution, It's also a great example of why the frontier pattern can be very useful. Thanks for sharing this. – Goles May 31 '11 at 20:17
  • I know this is really old, but how do you determine the patterns? This post was incredibly helpful so it would be nice to understand how you put together the "?", "*", "%d", etcs. Thanks! :) – MrHappyAsthma May 21 '13 at 14:40
  • 1
    @MrHappyAsthma I don't remember much (as this was two years ago), but I suppose that all came from the linked documentation page, as well as some experimenting. The information about the frontier pattern came from the page linked by jpjacobs. – Paŭlo Ebermann May 21 '13 at 18:51
2

"%d*%.?%d+" will match all such numbers in decimal format (note that that's going to miss any signed numbers such as -1.1 or +3.14). You'll need to come up with another solution to avoid instances that end with ], such as removing them from the string before looking for the numbers:

local pattern = "%d*%.?%d+"
local clean = string.gsub(orig ,pattern .. "%]", "")
return string.gmatch(clean, pattern)
Stuart P. Bentley
  • 10,195
  • 10
  • 55
  • 84