3

If I have a string, such as

This is a website, it is at http://www.abc.com/post_id?id=123&key=456, please visit it and let me know. Thanks

How to parse this string in Lua, so I can obtain three substrings:

String 1 - the texts before the http(s) url

String 2 - http(s) url itself (with all parameters)

String 3 - the texts after http(s) url

Please note that there might be no space before "http". Thanks.

Joe Huang
  • 6,296
  • 7
  • 48
  • 81

1 Answers1

3

The simplest pattern would be: (.+)%s+(https?%S+)%s+(.*)$

local str = "This is a website, it is at http://www.abc.com/post_id?id=123&key=456, please visit it and let me know. Thanks"
local sPre, sLink, sPost = str:match( "(.+)%s+(https?%S+)%s+(.*)$" )

It'll give you: https://eval.in/43745

The downside is, you will get your URL with the , character included as well.


The middle section with (https?%S+) is where you can control your URL parameters. If you think the string can have the word http in them, modify it to: (https?://%S+) and similar other possibilities.

hjpotter92
  • 78,589
  • 36
  • 144
  • 183
  • Thanks a lot. It works great. But there is one more problem, if the string contains more than one http URL, it returns last http URL in the sLink. How to make it return first http URL in the sLink instead? – Joe Huang Aug 19 '13 at 08:14
  • (2) There is a second problem, what if there is no space before "http"? Most of the strings I want to parse have no space before "http"... Please help. Thanks. – Joe Huang Aug 19 '13 at 08:47
  • I'm not a RegEx boss. You could while, loop through sPost, until there's no more links. Add sLink to a table? – Frederik Spang Aug 19 '13 at 09:23
  • 1
    @JoeHuang, to extract the first URL, use `(.-)` instead of `(.+)`. – lhf Aug 19 '13 at 09:54
  • @JoeHuang, to allow no space before "http", just use remove the first `%s+` from the pattern. – lhf Aug 19 '13 at 11:00
  • @JoeHuang Here is another [eval.in](https://eval.in/43827) example to show how to catch multiple links. – hjpotter92 Aug 19 '13 at 13:12
  • What is the difference between `(https?%S+)` and `(http%S+)` ? – Egor Skriptunoff Aug 19 '13 at 20:25
  • 1
    @lhf - the second one also allows `s` to be optional and so matches both `https` and `http` :-) – Egor Skriptunoff Aug 20 '13 at 04:43
  • @EgorSkriptunoff, you're right of course, sorry for the noise. – lhf Aug 20 '13 at 09:36