3
local s = "http://example.com/image.jpg"
print(string.match(s, "/(.-)%.jpg"))

This gives me

--> /example.com/image

But I'd like to get

--> image
Yu Hao
  • 119,891
  • 44
  • 235
  • 294
  • Glad it worked for you. Please also consider upvoting if my answer proved helpful to you (see [How to upvote on Stack Overflow?](http://meta.stackexchange.com/questions/173399/how-to-upvote-on-stack-overflow)) since you reached 15 rep points and are now entitles to upvoting. – Wiktor Stribiżew Jun 25 '17 at 10:04

3 Answers3

3

If you're sure there is a / in the string just before the filename, this works:

print(string.match(s, ".*/(.-)%.jpg"))

The greedy match .*/ will stop at the last /, as desired.

lhf
  • 70,581
  • 9
  • 108
  • 149
2

Since the regex engine processes a string from left to right, your pattern found the first /, then .- matched any chars (.) as few as possible (-) up to the first literal . (matched with %.) followed with jpg substring.

enter image description here

You need to use a negated character class [^/] (to match any char but /) rather than . that matches any character:

local s = "http://example.com/image.jpg"
print(string.match(s, "/([^/]+)%.jpg"))
-- => image

See the online Lua demo

The [^/] matches any chars but /, thus, the last / will be matched with the first / in the pattern "/([^/]+)%.jpg". And it will match as

enter image description here

Removing the first / from the pattern is not a good idea as it will make the engine use more redundant steps while trying to find a match, / will "anchor" the quantified subpattern at the / symbol. It is easier for the engine to find a / than look for 0+ (undefined from the beginning) number of chars other than /.

If you are sure this string appears at the end of the string, add $ at the end of the pattern (it is not clear actually if you need that, but might be best in the general case).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • The leading `/` is redundant. Also, not having a `$` at the end may also unexpectedly match a folder name (however, unlikely) contains `.jpg` in its name. Perhaps `'([^/]+)%.jpg$'` is more accurate. – tonypdmtr Jun 23 '17 at 23:23
  • @tonypdmtr: I understand the `$` is a good idea, but removing `/` is really not good for performance. – Wiktor Stribiżew Jun 25 '17 at 10:12
2

Why doesn't this match non-greedily and give me just the image name?

To answer the question directly: .- doesn't guarantee the shortest match as the left part of the match is still anchored at the current position and if something is matched at that position, it will be returned as the result. Non-greedily just means that it will consume the least number of characters matched by its pattern as long as the rest of the pattern is matched. That's why using [^/]- fixes the pattern as it will find the shortest number of characters that are not forward slashes and why using .*/.- works as in this case .* will greedily consume everything and then backtrack until the rest of the pattern is satisfied (which will produce the same result in this case).

Paul Kulchenko
  • 25,884
  • 3
  • 38
  • 56