Dealing with domains is going to get a lot more complex in the future, with new TLDs coming on line. Assuming that .edu
is the only educational TLD will be wrong.
A simple way to grab just the domain for now is:
"gates@harvard.edu"[/(@.+)$/, 1] # => "@harvard.edu"
That will handle things like:
"gates@mail.harvard.edu"[/(@.+)$/, 1] # => "@mail.harvard.edu"
If you don't want the @
, simply shift the opening parenthesis right one character:
pattern = /@(.+)$/
"gates@harvard.edu"[pattern, 1] # => "harvard.edu"
"gates@mail.harvard.edu"[pattern, 1] # => "mail.harvard.edu"
If you want to normalize the domain to strip off sub-domains, you can do something like:
pattern = /(\w+\.\w+)$/
"harvard.edu"[pattern, 1] # => "harvard.edu"
"mail.harvard.edu"[pattern, 1] # => "harvard.edu"
which only grabs the last two "words" that are separated by a single .
.
That's somewhat naive, as non-US domains can have a country code, so if you need to handle those you can do something like:
pattern = /(\w+\.edu(?:\.\w+)?)$/
"harvard.edu"[pattern, 1] # => "harvard.edu"
"harvard.edu.cc"[pattern, 1] # => "harvard.edu.cc"
"mail.harvard.edu.cc"[pattern, 1] # => "harvard.edu.cc"
And, as to whether you should do this before or after you've verified their address? Do it AFTER. Why waste your CPU time and disk space processing invalid addresses?