5

I need to extract domain (four.five) from URL (one.two.three.four.five) in a Lua string variable.

I can't seem to find a function to do this in Lua.

EDIT:

By the time the URL gets to me, the http stuff has already been stripped off. So, some examples are:

a) safebrowsing.google.com 
b) i2.cdn.turner.com 
c) powerdns.13854.n7.nabble.com 

so my result should be:

a) google.com
b) turner.com
c) nabble.com
Yu Hao
  • 119,891
  • 44
  • 235
  • 294
Xi Vix
  • 1,381
  • 6
  • 24
  • 43
  • 1
    this is an old post, but perhaps this is a useful hint: keep in mind that there are domains, where the last two segments are not useful, for example in Great Britain, a lot of domais end in _.co.uk_ – P.J.Meisch Apr 09 '16 at 14:41

3 Answers3

7

This should work:

local url = "foo.bar.google.com"
local domain = url:match("[%w%.]*%.(%w+%.%w+)")
print(domain)       

Output:google.com

The pattern [%w%.]*%.(%w+%.%w+) looks for the content after the second dot . from the end.

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
5
local url = "http://foo.bar.com/?query"
print(url:match('^%w+://([^/]+)')) -- foo.bar.com

This pattern '^%w+://([^/]+)' means: ^ from the beginning of the line, take %w+ one or more alphanumeric characters (this is the protocol), then ://, then [^/]+ 1 or more characters other than slash and return (capture) these characters as the result.

Paul Kulchenko
  • 25,884
  • 3
  • 38
  • 56
  • I need to start from the end moving from right to left since I don't know how long the url will be ... could be one.two.three or one.two.three.four or one.two.three.four.five In other languages I have done it by counting the periods from right to left and extracting the string starting with the second period from the right. I don't know how to do that in lua. – Xi Vix Sep 13 '13 at 01:12
  • 2
    provide an example of the URL (ideally several ones) you are trying to parse. – Paul Kulchenko Sep 13 '13 at 01:16
  • By the time the url gets to me, the http stuff has already been stripped off. So, some examples are: a) safebrowsing.google.com b) i2.cdn.turner.com c) powerdns.13854.n7.nabble.com ... so my result should be: a) google.com, b) turner.com, c) nabble.com – Xi Vix Sep 13 '13 at 01:32
0

Use Paul's answer to extract domain like 1.2.3.4.4.5

local url = "http://foo.bar.com/?query" local domain = url:match('^%w+://([^/]+)'))

and next use of of "split" methods to build array for parts

http://lua-users.org/wiki/SplitJoin

like

local arr = split(domain, '%.') --escaped point because it is part of "patterns"

Next you can use latest two: arr[#arr], arr[#arr-1]

Alex Zaharov
  • 69
  • 1
  • 3