-2

I get whois information for a bunch of URLs by wget the following address

wget -qO- https://www.whois.com/whois/SampleDomain

At the first phase I wanna not creating a file for each URL, so I use -qO- option.

I want to extract 10 field of every domain (such as, Creation Date, Registrant Name)

My question is: How can I get make a csv file which every row define the domain and each column has the value of the whois information?

jww
  • 97,681
  • 90
  • 411
  • 885
John
  • 25
  • 5

1 Answers1

0

With xmlstarlet, GNU grep and GNU paste. A first step:

wget -qO - https://www.whois.com/whois/stackoverflow.com |\
  xmlstarlet format --html --recover 2>/dev/null |\
  xmlstarlet select --template --value-of '//pre' |\
  grep -Po '^(Creation Date|Registrant Name): \K.*(?= )' |\
  paste -d , - -

Output:

2003-12-26T19:18:07Z,Sysadmin Team
Cyrus
  • 84,225
  • 14
  • 89
  • 153
  • Thanks for your answer. How can I output the result as the following to a file? 2003-12-26T19:18:07Z,Sysadmin Team – John Nov 01 '17 at 18:28
  • I've updated my answer. – Cyrus Nov 01 '17 at 18:47
  • Thanks again for your update. For multi-field like => Registrant Name|Registrant Organization|Registrant City|Registrant Country|Registrar IANA ID|Creation Date|Updated Date|Registry Expiry Date and the following address (https://www.whois.com/whois/academicreviews.us) it does not print the appropriate result! – John Nov 01 '17 at 19:00
  • There, the lines do not end with a blank character but with a semicolon. Replace `(?= )` with `(?=( |;))`. Use with `paste` for each column one `-`. – Cyrus Nov 01 '17 at 19:12