0

I've created a script that uses a Python package, called IPWhois. IPWhois creates parsed data of IP whois records. Like this:

{'asn': '13968',
 'asn_cidr': '12.231.58.0/24',
 'asn_country_code': 'US',
 'asn_date': '1983-08-23',
 'asn_description': 'CAISO-NET-BLK, US',
 'asn_registry': 'arin',
 'nets': [{'address': '200 S. Laurel AVE.',
           'cidr': '12.0.0.0/8',
           'city': 'MIDDLETOWN',
           'country': 'US',
           'created': '1983-08-22',
           'description': 'AT&T Services, Inc.',
           'emails': ['abuse@att.net',
                      'jb3310-arin@oz.mt.att.com',
                      'hk2514@att.com',
                      'bm870e@intl.att.com',
                      'swipid@icorefep1.ims.att.com',
                      'addrmgt@qsun.att.com'],
           'handle': 'NET-12-0-0-0-1',
           'name': 'ATT',
           'postal_code': '07748',
           'range': '12.0.0.0 - 12.255.255.255',
           'state': 'NJ',
           'updated': '2013-12-19'},
          {'address': '1000 S FREMONT ST',
           'cidr': '12.231.58.0/24',
           'city': 'ALHAMBRA',
           'country': 'US',
           'created': '2009-03-06',
           'description': 'CALIFORNIA ISO',
           'emails': ['RMelis@caiso.com', 'shendrickson@caiso.com'],
           'handle': 'NET-12-231-58-0-1',
           'name': 'CALIFORN50-58',
           'postal_code': '91803',
           'range': None,
           'state': 'CA',
           'updated': '2009-03-06'}],
 'nir': None,
 'query': '12.231.58.214',
 'raw': None,
 'raw_referral': None,
 'referral': None}

My full script is here, where it reads from a file (a list of IP addresses), performs the ipwhois lookup on each IP, then outputs the IP, CIDR range, and email address within each whois record.

import pandas as pd 
from ipwhois import IPWhois

def get_email(x):
    try:
        email = x['nets'][0]['emails'][0]
    except TypeError: 
        email = None
    return email

df = pd.read_csv('C:[file_path]ip_test.csv', names=['ip'])
df['whois_obj'] = df['ip'].apply(IPWhois)
df['result'] = df['whois_obj'].apply(lambda x: x.lookup_whois(asn_methods=['dns','whois','http']))
df['cidr'] = df['result'].apply(lambda x: x['nets'][0]['cidr'])
df['email_orgdomain'] = df['result'].apply(get_email)
#df['email_orgdomain'] = df['result'].apply(lambda x: x['nets'][0]['emails'][0].split('@')[-1])
df.drop(columns=['whois_obj', 'result'], inplace=True)
df

My current data frame output looks like the following, however I want to see ALL unique email addresses in the last column that would be present in a whois record. The emails in this example are just for contextual reference:

    ip              cidr               email
0   50.28.53.255    50.28.0.0/17       liquidweb.com
1   82.94.177.112   82.94.177.96/27    xs4all.nl
2   213.206.90.234  213.206.90.128/25  kpn.com
3   85.222.239.107  85.222.236.0/22    atom86.net
4   85.222.239.101  85.222.236.0/22    atom86.net
5   91.213.201.45   91.213.201.0/24    None
6   43.251.240.6    43.251.240.0/22    au.abnamroclearing.com
7   140.86.230.51   140.86.230.0/24    oracle.com

I'm looking for some help to loop through all email addresses within my def get_email(x) function and add unique address to a set. Please note that in the whois data example that I've provided, there are two fields titled 'email', one with att.com domains, and farther down, another with caiso.com domains. This is because some ARIN records list registrant information for the parent CIDR as well as the child CIDR.

I would like to all unique email addresses from all fields titled 'email'.

Thank you very much for any help!

0 Answers0