I'm trying to extract length and suffix (tld) from a list of websites in a pandas data frame.
Website. Label
18egh.com 1
fish.co.uk 0
www.description.com 1
http://world.com 1
My desired output should be
Website Label Length Tld
18egh.com 1 5 com
fish.co.uk 0 4 co.uk
www.description.com 1 11 com
http://world.com 1 5 com
I've tried first with the length as shown as follows:
def get_domain(df):
my_list=[]
for x in df['Website'].tolist():
domain = urlparse(x).netloc
my_list.append(domain)
df['Domain'] = my_list
df['Length']=df['Domain'].str.len()
return df
but when I check the list is empty. I know that for extracting information on domain and tld it'd probably enough to use url parse, but if I am wrong I'd appreciate if you'd point me on the right direction.