22

Say I have a string like this: "http://something.example.com/directory/"

What I want to do is to parse this string, and extract the "something" from the string.

The first step, is to obviously check to make sure that the string contains "http://" - otherwise, it should ignore the string.

But, how do I then just extract the "something" in that string? Assume that all the strings that this will be evaluating will have a similar structure (i.e. I am trying to extract the subdomain of the URL - if the string being examined is indeed a valid URL - where valid is starts with "http://").

Thanks.

P.S. I know how to check the first part, i.e. I can just simply split the string at the "http://" but that doesn't solve the full problem because that will produce "http://something.example.com/directory/". All I want is the "something", nothing else.

Boris Stitnicky
  • 12,444
  • 5
  • 57
  • 74
marcamillion
  • 32,933
  • 55
  • 189
  • 380
  • http://www.regular-expressions.info/ruby.html – durron597 Nov 06 '12 at 01:45
  • 3
    @durron597: Don't hammer everything with regexen. URL is a well defined object, treated million times both in Ruby standard library and in million other gems. If I was an expert, I would answer. – Boris Stitnicky Nov 06 '12 at 01:48
  • What's with all the downvotes? Don't get it. – marcamillion Nov 06 '12 at 01:55
  • What if you have something lik `http://a.b.c.d/directory`, more than 2 dots in the host ? – oldergod Nov 06 '12 at 02:11
  • 1
    The downvotes are probably because there are a number of questions similar to this one. You can find out how to get the host from a URL, then find the first element of a `'.'` delimited string. – the Tin Man Nov 06 '12 at 03:09

4 Answers4

39

I'd do it this way:

require 'uri'

uri = URI.parse('http://something.example.com/directory/')
uri.host.split('.').first
=> "something"

URI is built into Ruby. It's not the most full-featured but it's plenty capable of doing this task for most URLs. If you have IRIs then look at Addressable::URI.

Curious Sam
  • 884
  • 10
  • 12
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
9

You could use URI like

uri = URI.parse("http://something.example.com/directory/")
puts uri.host
# "something.example.com"

and you could then just work on the host.
Or there is a gem domainatrix from Remove subdomain from string in ruby

require 'rubygems'
require 'domainatrix'

url = Domainatrix.parse("http://foo.bar.pauldix.co.uk/asdf.html?q=arg")
url.public_suffix       # => "co.uk"
url.domain              # => "pauldix"
url.subdomain           # => "foo.bar"
url.path                # => "/asdf.html?q=arg"
url.canonical           # => "uk.co.pauldix.bar.foo/asdf.html?q=arg"

and you could just take the subdomain.

Community
  • 1
  • 1
oldergod
  • 15,033
  • 7
  • 62
  • 88
2

Well, you can use regular expressions. Something like /http:\/\/([^\.]+)/, that is, the first group of non '.' letters after http.

Check out http://rubular.com/. You can test your regular expressions against a set of tests too, it's great for learning this tool.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
resilva87
  • 3,325
  • 5
  • 32
  • 43
2

with URI.parse you can get:

require "uri"

uri = URI.parse("http://localhost:3000")
uri.scheme # http
uri.host # localhost
uri.port # 3000
Dorian
  • 7,749
  • 4
  • 38
  • 57