How to sort .edu email domains?

Question

I am using Ruby on Rails to make a university-exclusive website that categorizes all registered users into their specific universities via their ".edu" email. Nearly all US-based universities have an "xyz.edu" email domain. In essence, everyone that signs up with their ".edu" email would all be categorized with a similar "domain.edu".

I've searched for a regex to look for like-domains.edu and assign them into a variable or specific indexes, but I must be looking in the wrong place because I cannot find how to do this.

Would I use regex for this? Or maybe a method after their email has been verified?

I would appreciate any help or feedback I can get.

You can use regex to match patterns, not to sort things. Of course, you could use Ruby to sort things based on matches which you've made using regex. — Vasili Syrakis, Dec 19 '13 at 02:44
What code have you written? "Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See http://SSCCE.org for guidance." — the Tin Man, Dec 19 '13 at 03:18

score 2 · Answer 1 · answered Dec 19 '13 at 02:48

2

You could use a regex to extract domain names:

"gates@harvard.edu" =~ /.*@(.*)$/

This simple regexp will capture everything after the @ symbol. You can experiment more with this regexp here.

However, what you have to think about is how to handle cases like gates@harvard.edu vs gates@seas.harvard.edu.

My example will parse them out as different entities: harvard.edu vs seas.harvard.edu.

answered Dec 19 '13 at 02:48

Arman H

5,488
10
51
76

You can do `"gates@harvard.edu".scan(/.*@(?.*)$/)` and then `domain.split('.')[-2]`. It will return `harvard` for booth: `gates@harvard.edu` and `gates@seas.harvard.edu`. – Hauleth Dec 19 '13 at 02:58
1

@ŁukaszNiemier, you can also use negative look-ups in the regexp to parse out only the TLDs. I gave my solutions because it's not clear what the OP needs. Perhaps he wants to preserve 2nd level names... – Arman H Dec 19 '13 at 03:02
My solution preserve 2nd level names. More, it saves all domain parts. – Hauleth Dec 19 '13 at 09:48
`scan` isn't that useful for this. It wants to find repeated occurrences of the pattern and return an array, forcing us to deal with an array as a result. – the Tin Man Dec 19 '13 at 14:38

kddeisz · Answer 2 · 2013-12-19T14:23:02.833

1

I would probably go ahead and create an institution/university/group model that would hold those users. It would be easier now than later down the line. But, in an effort to answer your question, you could do something like:

array_of_emails = ['d@xyz.edu', 'a@abc.edu', 'c@xyz.edu', 'b@abc.edu' ]
array_of_emails.sort_by! { |email| "#{email[email.index('@')..-1]}#{email[0..email.index('@')]}" }

EDIT: Changed sort! to sort_by!

edited Dec 19 '13 at 14:23

answered Dec 19 '13 at 02:50

kddeisz

5,162
3
21
44

1

`sort_by` is better here. – Amadan Dec 19 '13 at 03:14

the Tin Man · Answer 3 · 2013-12-19T14:42:07.407

Dealing with domains is going to get a lot more complex in the future, with new TLDs coming on line. Assuming that .edu is the only educational TLD will be wrong.

A simple way to grab just the domain for now is:

"gates@harvard.edu"[/(@.+)$/, 1] # => "@harvard.edu"

That will handle things like:

"gates@mail.harvard.edu"[/(@.+)$/, 1] # => "@mail.harvard.edu"

If you don't want the @, simply shift the opening parenthesis right one character:

pattern = /@(.+)$/
"gates@harvard.edu"[pattern, 1] # => "harvard.edu"
"gates@mail.harvard.edu"[pattern, 1] # => "mail.harvard.edu"

If you want to normalize the domain to strip off sub-domains, you can do something like:

pattern = /(\w+\.\w+)$/
"harvard.edu"[pattern, 1] # => "harvard.edu"
"mail.harvard.edu"[pattern, 1] # => "harvard.edu"

which only grabs the last two "words" that are separated by a single ..

That's somewhat naive, as non-US domains can have a country code, so if you need to handle those you can do something like:

pattern = /(\w+\.edu(?:\.\w+)?)$/
"harvard.edu"[pattern, 1] # => "harvard.edu"
"harvard.edu.cc"[pattern, 1] # => "harvard.edu.cc"
"mail.harvard.edu.cc"[pattern, 1] # => "harvard.edu.cc"

And, as to whether you should do this before or after you've verified their address? Do it AFTER. Why waste your CPU time and disk space processing invalid addresses?

score 0 · Answer 4 · answered Dec 19 '13 at 09:53

0

array_of_emails = ['d@xyz.edu', 'a@abc.edu', 'c@xyz.edu', 'b@abc.edu' ]
x = array_of_emails.sort_by do | a | a.match(/@.*/)[0] end
x.each do |a|
  puts a
end

answered Dec 19 '13 at 09:53

devanand

5,116
2
20
19

Wait did you literally just copy paste my code and then put in sort_by instead of proposing an edit? – kddeisz Dec 19 '13 at 14:21

How to sort .edu email domains?

4 Answers4