0

I am trying to use python whois library to gather whois records of some web sites.

The problem is that I got nothing for some web sites such as nih.gov which is an active domain name!

w = whois.whois("nih.gov")
print w
{u'updated_date': None, u'status': u'ACTIVE', u'name': None, u'dnssec': None, u'city': None, u'expiration_date': None, u'zipcode': None, u'domain_name': u'NIH.GOV', u'country': None, u'whois_server': None, u'state': None, u'registrar': None, u'referral_url': None, u'address': None, u'name_servers': None, u'org': None, u'creation_date': None, u'emails': None}

I can not understand what is the problem and which library or how should I use to cover all situations?

  • 1
    Compare [whois nih.gov](https://www.whois.com/whois/nih.gov) versus, say, [whois stackoverflow.com](https://www.whois.com/whois/stackoverflow.com). It appears this is all the information that `whois` provides. – unutbu Dec 24 '17 at 22:40

1 Answers1

2

Here's some code that'll do the job.

import sys
import socket
from datetime import datetime as dt
import time

def whois(ip):

    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect(("whois.arin.net", 43))
    s.send(('n ' + ip + '\r\n').encode())

    response = b""

    # setting time limit in secondsmd
    startTime = time.mktime(dt.now().timetuple())
    timeLimit = 3
    while True:
        elapsedTime = time.mktime(dt.now().timetuple()) - startTime
        data = s.recv(4096)
        response += data
        if (not data) or (elapsedTime >= timeLimit):
            break
    s.close()

    print(response.decode())

def main():
    domain = sys.argv[1];
    ip = socket.gethostbyname(domain);
    whois(ip)

main()

For example:

c:\Temp>py test.py www.google.com

#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#


#
# The following results may also be obtained via:
# https://whois.arin.net/rest/nets;q=216.58.213.196?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2
#

NetRange:       216.58.192.0 - 216.58.223.255
CIDR:           216.58.192.0/19
NetName:        GOOGLE
NetHandle:      NET-216-58-192-0-1
Parent:         NET216 (NET-216-0-0-0-0)
NetType:        Direct Allocation
OriginAS:       AS15169
Organization:   Google LLC (GOGL)
RegDate:        2012-01-27
Updated:        2012-01-27
Ref:            https://whois.arin.net/rest/net/NET-216-58-192-0-1


OrgName:        Google LLC
OrgId:          GOGL
Address:        1600 Amphitheatre Parkway
City:           Mountain View
StateProv:      CA
PostalCode:     94043
Country:        US
RegDate:        2000-03-30
Updated:        2017-12-21
Ref:            https://whois.arin.net/rest/org/GOGL


OrgAbuseHandle: ABUSE5250-ARIN
OrgAbuseName:   Abuse
OrgAbusePhone:  +1-650-253-0000
OrgAbuseEmail:  network-abuse@google.com
OrgAbuseRef:    https://whois.arin.net/rest/poc/ABUSE5250-ARIN

OrgTechHandle: ZG39-ARIN
OrgTechName:   Google LLC
OrgTechPhone:  +1-650-253-0000
OrgTechEmail:  arin-contact@google.com
OrgTechRef:    https://whois.arin.net/rest/poc/ZG39-ARIN


#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#

And specifically for www.nih.gov we get:

c:\Temp>py test.py www.nih.gov

#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#


#
# The following results may also be obtained via:
# https://whois.arin.net/rest/nets;q=23.21.241.1?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2
#

NetRange:       23.20.0.0 - 23.23.255.255
CIDR:           23.20.0.0/14
NetName:        AMAZON-EC2-USEAST-10
NetHandle:      NET-23-20-0-0-1
Parent:         NET23 (NET-23-0-0-0-0)
NetType:        Direct Allocation
OriginAS:       AS16509
Organization:   Amazon.com, Inc. (AMAZO-4)
RegDate:        2011-09-19
Updated:        2014-09-03
Comment:        The activity you have detected originates from a dynamic hosting environment.
Comment:        For fastest response, please submit abuse reports at http://aws-portal.amazon.com/gp/aws/html-forms-controller/contactus/AWSAbuse
Comment:        For more information regarding EC2 see:
Comment:        http://ec2.amazonaws.com/
Comment:        All reports MUST include:
Comment:        * src IP
Comment:        * dest IP (your IP)
Comment:        * dest port
Comment:        * Accurate date/timestamp and timezone of activity
Comment:        * Intensity/frequency (short log extracts)
Comment:        * Your contact details (phone and email) Without these we will be unable to identify the correct owner of the IP address at that point in time.
Ref:            https://whois.arin.net/rest/net/NET-23-20-0-0-1


OrgName:        Amazon.com, Inc.
OrgId:          AMAZO-4
Address:        Amazon Web Services, Inc.
Address:        P.O. Box 81226
City:           Seattle
StateProv:      WA
PostalCode:     98108-1226
Country:        US
RegDate:        2005-09-29
Updated:        2017-01-28
Comment:        For details of this service please see
Comment:        http://ec2.amazonaws.com/
Ref:            https://whois.arin.net/rest/org/AMAZO-4


OrgAbuseHandle: AEA8-ARIN
OrgAbuseName:   Amazon EC2 Abuse
OrgAbusePhone:  +1-206-266-4064
OrgAbuseEmail:  abuse@amazonaws.com
OrgAbuseRef:    https://whois.arin.net/rest/poc/AEA8-ARIN

OrgTechHandle: ANO24-ARIN
OrgTechName:   Amazon EC2 Network Operations
OrgTechPhone:  +1-206-266-4064
OrgTechEmail:  amzn-noc-contact@amazon.com
OrgTechRef:    https://whois.arin.net/rest/poc/ANO24-ARIN

OrgNOCHandle: AANO1-ARIN
OrgNOCName:   Amazon AWS Network Operations
OrgNOCPhone:  +1-206-266-4064
OrgNOCEmail:  amzn-noc-contact@amazon.com
OrgNOCRef:    https://whois.arin.net/rest/poc/AANO1-ARIN


#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#

different option

Here's another option.

This chunk of code creates a file in your script's folder with the HTML of a whois request from a different service. You can modify it to suit your needs, I've just written the basics.

import urllib.request
import tempfile
import io
from bs4 import BeautifulSoup
import sys

def writeFile(text):
    with io.open('whoisData.txt', "w", encoding="utf-8") as f:
        f.write(text)
    f.close()

def readHTML(domain):
    url = 'https://www.whois.com/whois/' + domain
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html)

    # kill all script and style elements
    for script in soup(["script", "style"]):
        script.extract()    # rip it out

    # get text
    text = soup.get_text()

    # break into lines and remove leading and trailing space on each
    lines = (line.strip() for line in text.splitlines())
    # break multi-headlines into a line each
    chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
    # drop blank lines
    text = '\n'.join(chunk for chunk in chunks if chunk)
    writeFile(text)

def main():
    domain = sys.argv[1]
    readHTML(domain)

main()

Took some reference from here (on parsing HTMLs).

oBit91
  • 398
  • 1
  • 12
  • Thank you @oBit91, I will use that. Very efficient. And another thing, When I was using python whois library, I could extract the registrar record of the whois response. Can we extract the registrar name by response.decode().registrar? I am so sorry, I do not have access to my lap top write now. – Shahrooz Pooryousef Dec 25 '17 at 00:15
  • but it seems it does not return the whois records. I test by python test.py bbc.com and the printed values are not the same as my whois.whois("bbc.com") output which is more valid. – Shahrooz Pooryousef Dec 25 '17 at 00:23
  • @ShahroozPooryousef I've added a second option using an online web and parsing the HTML file. Either way you can see what suits you best. – oBit91 Dec 25 '17 at 01:46
  • `whois.arin.net` is to be used for IP addresses, so you query it with an hostname that resolves (`www.google.com` in your examples) and not with a domain name, that may in fact not resolve. As for using `whois.com` this is just one of the many companies providing whois access like that, you should make sure to read their TOS and be happy if they do not collect nor use your data... (to say otherwise: it is best to query registry whois servers instead of 3rd parties) – Patrick Mevzek Jan 02 '18 at 15:39