1

Background

Running this snippet of code in python's interpreter, we get an IP address for gov.uk.

>>> import socket
>>> socket.gethostbyname('gov.uk')
'151.101.64.144'

gov.uk is a TLD according to Wikipedia and the Public Suffix List. Similar TLDs that are also domains include gov.au, gov.br, and s3.amazonaws.com.

In trying to answer this question with python, I tried using urlparse, but I just get a domain blob:

>>> from urllib.parse import urlparse
>>> urlparse('http://gov.uk')
ParseResult(scheme='http', netloc='gov.uk', 
    path='', params='', query='', fragment='')

Using tldextract, it looks like there's no domain or subdomain.

>>> import tldextract
>>> tldextract.extract('https://gov.uk')
ExtractResult(subdomain='', domain='', suffix='gov.uk')

Question

For https://gov.uk, which part is the domain and which part is the TLD?

Ross Jacobs
  • 2,962
  • 1
  • 17
  • 27
  • The wikipedia page you link to explicitely lists `.gov.uk' in the "Second-level domains" part of the article. – Thierry Lathuille Oct 02 '21 at 17:29
  • @ThierryLathuille Should the result of `tldextract.extract('https://gov.uk')` be `ExtractResult(subdomain='', domain='gov', suffix='uk')`? If this is the case, it is a point that many tools, including tldextract are confused on. – Ross Jacobs Oct 02 '21 at 17:38
  • There is no confusion to have. All domains are subdomains and hence all subdomains are also domains, this is all related to from where you look at things so it all depends on WHY you need to do this extract and what you do later with those results/parts. Use the terminology at https://url.spec.whatwg.org/#host-miscellaneous it is clear. `gov.uk` is a public suffix (even if registration under it is certainly not so public), it is a better term than `eTLD` or "effective TLD". One has to understand there are 2 facets: the resolution part (here all are domains) and the registration part. – Patrick Mevzek Oct 03 '21 at 01:14
  • If you want the TLD, with its strict definition, there is only one, the string after the last dot, so `uk` here. But it may not be useful for what you need to do, hence the public suffix is maybe more interesting, which is `gov.uk`. Things like that have consequences when searching for administrative boundaries. domain and subdomain terms should be avoided in this context as they do not help at all. suffix also. There is either a public suffix or a TLD, not a "suffix" or "extension", that doesn't exist. – Patrick Mevzek Oct 03 '21 at 01:17
  • Given that you have 2 long consecutive comments, it seems like you want to post an answer instead? – Ross Jacobs Oct 03 '21 at 04:17

1 Answers1

2

gov.uk, like .uk, is an Effective TLD or eTLD.

I picked this up from the go package public suffix and the wikipedia page for Public Suffix List.

Mozilla created the Public Suffix List, which is now managed by https://publicsuffix.org/list/. It can be found in Mozilla's Documentation, but this term does not appear anywhere on https://publicsuffix.org/list/ at the time of writing.

Ross Jacobs
  • 2,962
  • 1
  • 17
  • 27
  • This nuance with eTLDs is sort of implied in the [read.me for tldextract](https://github.com/john-kurkowski/tldextract#readme) (esp. with examples like `bbc.co.uk`), but it would certainly help if it was made more explicit. – ekhumoro Oct 02 '21 at 19:30
  • 1
    Documentation that has "nuance that is sort of implied" is documentation that needs improvement. I'll post an issue. – Ross Jacobs Oct 02 '21 at 19:33
  • 1
    Posted issue here: https://github.com/john-kurkowski/tldextract/issues/234 – Ross Jacobs Oct 02 '21 at 20:00