There is currently no live public free service that allows you to do what you want and even if there are technical solutions for that "soon", they will probably either not be public or not be free or heavily limited.
There is at least one possible shortcut (using zonefiles), but your question is not sufficiently detailed to be sure it fits, but see below. It may work better/faster than using the DNS, depending on your use case. It has benefits and drawbacks.
I will discuss also other points to put things in perspective, and my reply is generic (applies to multiple TLDs and in multiple ways). But this won't give you a ready-made script to just use, as both this website is not a writing board and your problem with some specific constraints outlined is far too big.
I won't repeat the solution based on DNS queries as it was given already, even if the answer given can be improved (you absolutely need to contact the registry nameservers, not recursive ones!)
RDAP
A slight parenthesis first: nowadays and specifically in gTLDs, RDAP should become the new standard. It is far better than whois since it is JSON over HTTPS, so it allows you to get structured data back.
It does include also the difference between lookup and query, which whois doesn't (some registry have a "domain availability check", like using finger; there was an IETF protocol for that, called IRIS D-CHK but it was at most only implemented by 2 registries, and being compressed XML over UDP it never got traction).
See RFC 7480 §4:
Clients use the GET method to retrieve a response body and use the
HEAD method to determine existence of data on the server.
Example:
$ curl --head https://rdap.verisign.com/com/v1/domain/stackoverflow.com
HTTP/1.1 200 OK
Content-Length: 2264
Content-Type: application/rdap+json
Access-Control-Allow-Origin: *
Strict-Transport-Security: max-age=15768000; includeSubDomains; preload
$ curl --head https://rdap.verisign.com/com/v1/domain/stackoverflow-but-does-not-exist.com
HTTP/1.1 404 Not Found
Content-Type: application/rdap+json
Access-Control-Allow-Origin: *
Strict-Transport-Security: max-age=15768000; includeSubDomains; preload
(if you do a GET in the first case, you will get back a JSON document you can process with jq
or equivalent).
Note also that "partial search" is baked inside this new protocol, see 4.1. Partial String Searching. It is a very simple case and not a regex: you can just use a wildcard. Of course, registry RDAP servers are not mandated to implement it.
Other works are under way to have a full "regex" search capability, see
Registration Data Access Protocol (RDAP) Search Using POSIX Regular Expressions and to a lesser extent Registration Data Access Protocol (RDAP) Reverse search capabilities
You can learn more about RDAP:
So even if you apply the solution of DNS then whois, I still highly suggest that you switch to DNS then RDAP.
Caveat: multiple registries and registrars RDAP servers are currently misbehaving/not respecting the specification. This will be straighten out in the future, when ICANN compliance kicks in and RDAP really starts to overshadow whois.
Registrars' API
Various registrars give you access to an API, which will include searching for available domain names and/or retrieving some domain names list (ex: dropping names, etc.).
What each registrar provide, and under which constraints will of course vary so it is impossibly to reply to you there. But for any serious research that would be a first stop: go to your preferred registrar and ask it what services it can have to help you in your case.
It will obviously depend on which TLDs the registrar is accredited in: registrars accredited with a registry have a live non public channel - using a protocol called EPP - to check for domain names existence.
Whois bulk access
This exists but is in most ways almost impossible to use.
For gTLDs, registrars are under contract with ICANN. If you read their contract you see this:
3.3.6 [..]
Registrar shall provide third-party bulk access to the data subject to
public access under Subsection 3.3.1 under the following terms and
conditions:
3.3.6.1 Registrar shall make a complete electronic copy of the data available at least one (1) time per week for download by third
parties who have entered into a bulk access agreement with Registrar.
3.3.6.2 Registrar may charge an annual fee, not to exceed US$10,000, for such bulk access to the data.
So, in theory, you are able to go to each registrar and ask it to provider "bulk whois access" which means more or less a complete dump of data, but:
- as written in contract above, it can be costly (there are more than 1000 registrars, and since you can not know in advance where a domain is registered, you will need to get all of them)
- data will not be fresh
- as for zonefiles below, it is not a live query/reply, you will need to download all the data, store it, process it and use it.
Zonefiles (gTLDs)
Again this mostly applies to gTLDs for reasons explained just after, but see next section for other cases.
This does not allow you for live queries as you need to download the data (once per day if you want to be fresh), store it somewhere on your infrastructure, and in a format that is relevant for the queries you need to do after (an RDBMS might not be the best storage here).
But this is the "easiest" and widest solution to your problem.
Per their contract with ICANN, all gTLDs registries are mandated to give free access to their zonefiles. A zonefile will contain all published domain names under the given TLD. This is a subset of all registered names (difficult to say by how much, but in the range of single digit percentage, if even so), because you can register a domain names without nameservers (hence it is not published) or the domain can be put "on hold" for various reasons and hence disappear from the zonefile. So you will get the same amount of false negative as when using live DNS queries: you will get no data (NXDOMAIN in fact) for some domains, but in fact they are registered (and hence not available for registration again).
So all starts there: https://www.icann.org/resources/pages/czds-2014-03-03-en
and the help section for users: https://czds.icann.org/help
You will need to create an account, sign a contract that outlines what you can and can not do with this data, and then you will be able to download daily zonefiles per TLD. Most, if not all gTLDs, put their zonefiles there. It may exist some doing differently, so you will need to search.
A zonefile will be in DNS "master zonefile" format. So you will see DNS records in them. You need to handle only the "NS" one, and you will see all domain names. You will need to make sure to normalize them (casing, final dot, etc.) as the content can vary from one file to another.
Once you have a daily list of domain names, you can apply any tool you want to search in them, including regular expressions. Be cautious however on the CPU and RAM constraints you can create, depending on how you store the data. The raw .com
zonefile is 13GB for example.
Comparing with live DNS queries, the biggest drawback is that it is not live (data may be as much as 24 hours old) and you need to download the files before being able to do anything you want, but the biggest benefit is that you have the list of "all" domains locally, hence you can apply much more powerful tools to search in them.
Zonefiles (non gTLDs)
Outside gTLDs, that is in ccTLDs, it is rare to have full zonefiles available, because many ccTLD operators believe it is proprietary or publicly identifiable data and that no one has valid business getting this, hence it is not available.
There are however counter examples:
PS: creative use of search engines (see the site:
modifier for example) can also help; of course they see only existing websites and a domain name can totally be registered but not having a website resolving on it.