Keeping domain of Email but removing TLD

Question

I am using python and I want to be able to keep the domain of the email but remove the 'com', or '.co.uk', or 'us', etc

So basically if I have an email, say random@gmail.com. I want to have only @gmail left in string format, but I want to do this for any email. So random@yahoo.com would leave me with @yahoo, or random@aol.uk, would leave me with @aol

so far I have:

 domain = re.search("@[\w.]+", val)
 domain = domain.group()

That returns the domain but with the TLD . So @gmail.com, or @aol.co

score 3 · Answer 1 · answered Aug 04 '16 at 13:23

3

If you do

val = string.split('@')[1].split('.')[0]

Change 'string' for your email string variable name.

This will take everything after the '@' symbol, then everything up to the first '.'

Using on 'random@gmail.com' gives 'gmail'

If you require the '@' symbol you can add it back with;

full = '@' + val

answered Aug 04 '16 at 13:23

Scott Stainton

394
2
14

you type faster than me. Same idea. Not sure why pandas is necessary, but that also worked – joel goldstick Aug 04 '16 at 13:40
Yeah yours also worked but accepted his answer since it fit my needs better, but would've accepted yours had he not put his up. Thanks though! – Kalimantan Aug 04 '16 at 14:39

joel goldstick · Answer 2 · 2016-08-04T13:38:45.860

2

First split on "@", take the part after "@". Then split on "." and take the first part

email = "this.that@gmail.com.x.y"
'@' + email.split("@")[1].split(".")[0]
'@gmail'

edited Aug 04 '16 at 13:38

answered Aug 04 '16 at 13:25

joel goldstick

4,393
6
30
46

Solution worked but accepted @jezrael because he kept the @ symbol. Thanks though, and I realize I can just change the index on the [1] – Kalimantan Aug 04 '16 at 13:34
@Kalimantan I'm not a pandas guy. It seems like it adds complexity. But your problem is solved. I added to my answer – joel goldstick Aug 04 '16 at 13:39

jezrael · Accepted Answer · 2016-08-04T13:39:49.143

With pandas functions use split:

df = pd.DataFrame({'a':['random@yahoo.com','random@aol.uk','random@aol.co.uk']})

print (df)
                  a
0  random@yahoo.com
1     random@aol.uk
2  random@aol.co.uk

print ('@' + df.a.str.split('@').str[1].str.split('.', 1).str[0] )
0    @yahoo
1      @aol
2      @aol
Name: a, dtype: object

But faster is use apply, if in column are not NaN values:

df = pd.concat([df]*10000).reset_index(drop=True)

print ('@' + df.a.str.split('@').str[1].str.split('.', 1).str[0] )
print (df.a.apply(lambda x: '@' + x.split('@')[1].split('.')[0]))

In [363]: %timeit ('@' + df.a.str.split('@').str[1].str.split('.', 1).str[0] )
10 loops, best of 3: 79.1 ms per loop

In [364]: %timeit (df.a.apply(lambda x: '@' + x.split('@')[1].split('.')[0]))
10 loops, best of 3: 27.7 ms per loop

Another solution with extract is faster as split, it can be used if NaN values in column:

#not sure with all valid characters in email address
print ( '@' + df.a.str.extract(r"\@([A-Za-z0-9_]+)\.", expand=False))
In [365]: %timeit ( '@' + df.a.str.extract(r"\@([A-Za-z0-9 _]+)\.", expand=False))
10 loops, best of 3: 39.7 ms per loop

What happens to john@arc.nasa.gov, or rms@mail.york.ac.uk – James K Aug 04 '16 at 14:11 — James K, Aug 04 '16 at 14:11

score 0 · Answer 4 · answered Aug 04 '16 at 15:55

For posterity and completeness, this can also be done via index and slice:

email = 'random@aol.co.uk'
at = email.index('@')
dot = email.index('.', at)
domain = email[at:dot]

Using split()and re seems like overkill when the goal is to extract a single sub-string.

Keeping domain of Email but removing TLD

4 Answers4