1

I'm using pythons whois module to check free domains in .by zone. The module currently does't support it. But all i need to do is to add these code into .../lib/python3.8/site-packages/whois/tld_regexpr.py:

by = {
'extend': 'com'
}

I think that it is not right to hardcode into lib folder. My code looks like so now:

import whois

def free_domains(domain_list):
    """Looking for free domains"""
    free_d = []
    for domain in domain_list:
        if whois.query(domain) is None:
            free_d.append(domain)
    return free_d

But it doesn't work without those injection. How can I extend tld_regexpr.py from my .py file?

Iguananaut
  • 21,810
  • 5
  • 50
  • 63
S. A.
  • 101
  • 2
  • 11

1 Answers1

1

For reference, here is the source code for the whois.tld_regexpr. It is used in whois._2_parse like so:

from . import tld_regexpr

TLD_RE = {}


def get_tld_re(tld):
    if tld in TLD_RE:
        return TLD_RE[tld]
    v = getattr(tld_regexpr, tld)
    extend = v.get('extend')

    if extend:
        e = get_tld_re(extend)
        tmp = e.copy()
        tmp.update(v)
    else:
        tmp = v

    if 'extend' in tmp:
        del tmp['extend']

    TLD_RE[tld] = dict((k, re.compile(v, re.IGNORECASE) if isinstance(v, str) else v) for k, v in tmp.items())
    return TLD_RE[tld]


[get_tld_re(tld) for tld in dir(tld_regexpr) if tld[0] != '_']

As we can see, this runs some module-level code that generates regular expressions from the data in the tld_regexpr and caches them in a TLD_RE global table.

Annoyingly, there is no way to easily extend tld_regexpr before this happens, as this module is imported from the top level __init__.py. Then, the internal code doesn't even use get_tld_re anymore after that, even though it provides an interface to the cache :/ So you need to call this get_tld_re explicitly on your new TLD after you add it. Something like:

from whois import tld_regexpr
from whois._2_parse import get_tld_re  # a "private" module, but they leave us little choice
tld_regexpr.by = {
    'extend': 'com'
}
get_tld_re('by')

Before:

>>> whois.query('bayern.by')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/embray/src/python-whois/whois/__init__.py", line 60, in query
    raise UnknownTld('Unknown TLD: %s\n(all known TLD: %s)' % (tld, list(TLD_RE.keys())))
whois.exceptions.UnknownTld: Unknown TLD: by
(all known TLD: ['com', 'uk', 'ac_uk', 'ar', 'at', 'pl', 'be', 'biz', 'br', 'ca', 'cc', 'cl', 'club', 'cn', 'co', 'jp', 'co_jp', 'cz', 'de', 'edu', 'eu', 'fr', 'id', 'info', 'io', 'it', 'kr', 'kz', 'ru', 'lv', 'me', 'mobi', 'mx', 'name', 'net', 'nyc', 'nz', 'online', 'org', 'pharmacy', 'press', 'pw', 'store', 'rest', 'ru_rf', 'security', 'sh', 'site', 'space', 'tech', 'tel', 'theatre', 'tickets', 'tv', 'us', 'uz', 'video', 'website', 'wiki', 'xyz'])

after:

>>> from whois import tld_regexpr
>>> from whois._2_parse import get_tld_re  # a "private" module, but they leave us little choice
>>> tld_regexpr.by = {
...     'extend': 'com'
... }
>>> get_tld_re('by')
{'domain_name': re.compile('Domain Name:\\s?(.+)', re.IGNORECASE), 'registrar': re.compile('Registrar:\\s?(.+)', re.IGNORECASE), 'registrant': None, 'creation_date': re.compile('Creation Date:\\s?(.+)', re.IGNORECASE), 'expiration_date': re.compile('Registry Expiry Date:\\s?(.+)', re.IGNORECASE), 'updated_date': re.compile('Updated Date:\\s?(.+)$', re.IGNORECASE), 'name_servers': re.compile('Name Server:\\s*(.+)\\s*', re.IGNORECASE), 'status': re.compile('Status:\\s?(.+)', re.IGNORECASE), 'emails': re.compile('[\\w.-]+@[\\w.-]+\\.[\\w]{2,4}', re.IGNORECASE)}
>>> whois.query('bayern.by')
<whois._3_adjust.Domain object at 0x6ffffbbc9e8>

I guess the module doesn't have the best design for extensibility, but it's ok--could be fixed with some small tweaks. In the meantime you should submit a PR to the author to add more ccTLDs, or to make extensibility easier.

Iguananaut
  • 21,810
  • 5
  • 50
  • 63
  • Importing `tld_regexpr` causes `whois` itself to be imported, so the machinery that uses `tld_regexpr` has already run before you patch it. – chepner Jan 31 '20 at 16:32
  • Yes, I explained exactly that, and how to work around it. – Iguananaut Jan 31 '20 at 16:36
  • Never mind; when I was testing something similar, I thought reimporting `whois` in order to run `query` somehow reset the data. Not sure what I had done differently, but `import whois; whois.query(...)` does seem to work after this patch. – chepner Jan 31 '20 at 16:46
  • Added demonstration. – Iguananaut Jan 31 '20 at 16:47