2

How can I extract multiple segments from the result of a Whois lookup?

I get an array that results form a Whois lookup (from a foreach loop).

So for example If I want everything from the "domain...." line to the ">>> Last update" of the WHOIS database: -line. How do I do that?

The Whois is performed with an exec command:

foreach ($query as $domain) {               
            $scanUrl = 'whois '.$domain->url;
            exec($scanUrl, $output);             
    }

The Whois works without issue, and I can get the created, expires and registrars with a preg_grep:

    $domainCreated  = preg_grep('/created/', $output);
    $domainExpires  = preg_grep('/expires/', $output);
    $domainRegistrar  = preg_grep('/registrar..........:/', $output);

But what I need to get is multiple pieces from the array, for example from the domain.... line to the >>> Last update of WHOIS database: -line.

All the Whois results are in one array. The Whois result looks like this:

Array
(
[0] =>
[1] => domain.............: iltalehti.fi
[2] => status.............: Registered
[3] => created............: 1.1.1991 00:00:00
[4] => expires............: 31.8.2022 00:00:00
[5] => available..........: 30.9.2022 00:00:00
[6] => modified...........: 6.9.2017
[7] => holder transfer....: 13.7.2013
[8] => RegistryLock.......: no
[9] =>
[10] => Nameservers
[11] =>
[12] => nserver............: a.ns-sec.com [Technical Error]
[13] => nserver............: d.ns-sec.org [OK]
[14] => nserver............: c.ns-sec.fi [178.217.128.53] 
[2001:67c:224:53::53:1] [OK]
[15] => nserver............: b.ns-sec.net [OK]
[16] =>
[17] => DNSSEC
[18] =>
[19] => dnssec.............: no
[20] =>
[21] => Holder
[22] =>
[23] => name...............: Alma Media Oyj
[24] => register number....: 1944757-4
[25] => address............: PL 140
[26] => address............: 00101
[27] => address............: Helsinki
[28] => country............: Finland
[29] => phone..............: +358 10 665 000
[30] => holder email.......:
[31] =>
[32] => Registrar
[33] =>
[34] => registrar..........: Cybercom Finland Oy
[35] => www................: www.cybercom.com
[36] =>
[37] => >>> Last update of WHOIS database: 24.3.2020 12:45:05 (EET) <<<
[38] =>
[39] =>
[40] => Copyright (c) Finnish Transport and Communications Agency Traficom
[41] =>
[42] =>
[43] => domain.............: yle.fi
[44] => status.............: Registered
[45] => created............: 1.1.1991 00:00:00
[46] => expires............: 31.8.2020 00:00:00
[47] => available..........: 30.9.2020 00:00:00
[48] => modified...........: 16.1.2018
[49] => RegistryLock.......: no
[50] =>
[51] => Nameservers
[52] =>
[53] => nserver............: ns-997.awsdns-60.net [OK]
[54] => nserver............: ns-1394.awsdns-46.org [OK]
[55] => nserver............: ns-1882.awsdns-43.co.uk [OK]
[56] => nserver............: ns-76.awsdns-09.com [OK]
[57] =>
[58] => DNSSEC
[59] =>
[60] => dnssec.............: no
[61] =>
[62] => Holder
[63] =>
[64] => name...............: Yleisradio Oy
[65] => register number....: 0215438-8
[66] => address............: Radiokatu 5
[67] => address............: 00024
[68] => address............: Yleisradio
[69] => country............: Finland
[70] => phone..............: +358914801
[71] => holder email.......:
[72] =>
[73] => Registrar
[74] =>
[75] => registrar..........: Yleisradio Oy
[76] =>
[77] => >>> Last update of WHOIS database: 24.3.2020 12:45:12 (EET) <<<
[78] =>
[79] =>
[80] => Copyright (c) Finnish Transport and Communications Agency Traficom
[81] =>
[82] =>
[83] => domain.............: is.fi
[84] => status.............: Registered
[85] => created............: 12.9.2016 10:01:17
[86] => expires............: 12.9.2020 10:01:17
[87] => available..........: 12.10.2020 10:01:17
[88] => modified...........: 17.9.2017
[89] => holder transfer....: 3.2.2017
[90] => RegistryLock.......: no
[91] =>
[92] => Nameservers
[93] =>
[94] => nserver............: ns-2017.awsdns-60.co.uk [OK]
[95] => nserver............: ns-824.awsdns-39.net [OK]
[96] => nserver............: ns-111.awsdns-13.com [OK]
[97] => nserver............: ns-1159.awsdns-16.org [OK]
[98] =>
[99] => DNSSEC
[100] =>
[101] => dnssec.............: no
[102] =>
[103] => Holder
[104] =>
[105] => name...............: Sanoma Media Finland Oy
[106] => register number....: 1515901-4
[107] => address............: Töölönlahdenkatu 2
[108] => address............: 00100
[109] => address............: Helsinki
[110] => country............: Finland
[111] => phone..............: +35891221
[112] => holder email.......:
[113] =>
[114] => Registrar
[115] =>
[116] => registrar..........: Sanoma Oyj
[117] =>
[118] => >>> Last update of WHOIS database: 24.3.2020 12:46:59 (EET) <<<
[119] =>
[120] =>
[121] => Copyright (c) Finnish Transport and Communications Agency Traficom
[122] =>
[123] =>
[124] => domain.............: hs.fi
[125] => status.............: Registered
[126] => created............: 10.7.2009 00:00:00
[127] => expires............: 14.7.2020 11:17:58
[128] => available..........: 14.8.2020 11:17:58
[129] => modified...........: 7.9.2017
[130] => RegistryLock.......: no
[131] =>
[132] => Nameservers
[133] =>
[134] => nserver............: ns-83.awsdns-10.com [OK]
[135] => nserver............: ns-1635.awsdns-12.co.uk [OK]
[136] => nserver............: ns-1461.awsdns-54.org [OK]
[137] => nserver............: ns-678.awsdns-20.net [OK]
[138] =>
[139] => DNSSEC
[140] =>
[141] => dnssec.............: no
[142] =>
[143] => Holder
[144] =>
[145] => name...............: Sanoma Media Finland Oy / Helsingin Sanomat
[146] => register number....: 1515901-4
[147] => address............: Töölönlahdenkatu 2
[148] => address............: 00100
[149] => address............: Helsinki
[150] => country............: Finland
[151] => phone..............: +35891221
[152] => holder email.......:
[153] =>
[154] => Registrar
[155] =>
[156] => registrar..........: Sanoma Oyj
[157] =>
[158] => >>> Last update of WHOIS database: 24.3.2020 12:45:20 (EET) <<<
[159] =>
[160] =>
[161] => Copyright (c) Finnish Transport and Communications Agency Traficom
[162] =>
)

I've tried stuff like:

$domainRawScan = preg_grep('/\bdomain\b.*\b>>> Last update of WHOIS database:\b/', $output);

But I am very new to using RegExp and find the syntax rather confusing. Any help would be appreciated.

coderv55
  • 23
  • 3
  • What exactly do you want to extract? The date with time? – Markus Zeller Mar 22 '20 at 13:17
  • @MarkusZeller I want to extract every line starting from the domain line and ending at the >>> Last update of WHOIS database: -line. – coderv55 Mar 22 '20 at 13:54
  • Do not execute a whois client through the shell, use appropriate whois library from your programming language or at least just open the TCP/43 socket yourself, as whois is a very simple protocol. Also remember that whois output is unstructured so it is very hard to parse it properly in all cases, again there are libraries that do or try to do or do in part that already for you. And for some TLDs, like gTLDs, you should start to look at RDAP instead of whois, where you will rejoice with structured output since it is JSON. – Patrick Mevzek Mar 24 '20 at 00:31

1 Answers1

0

One way of proceeding is to take the $output array returned by the exec command and turn it back into a single string:

$text = implode("\n", $output)

Then use preg_match_all to get all the keyword and values

preg_match_all('/^(.*?)\\.*: (.+)/m', $text, $matches);

Then $matches[1][n] will have keyword n and $matches[2][n] will have value n.

Regex Demo

^             # Start of line in multiline mode
(             # Start of capture group 1
   .*?        # Match 0 or more characters until ...
)             # End of capture group 1
\.*           # Match 0 or more periods
:             # Match a colon followed by a space
(             # Start of capture group 2
   .+         # Match 1 or more characters up to but not including a newline
)             # End of capture group 2

Update

Each time through the loop you will process one domain and keyword/value pairs. What you will do with these is up to you.

foreach ($query as $domain) {
    $scanUrl = 'whois '. $domain->url;
    $output = []; // start with an empty array
    exec($scanUrl, $output);
    $text = implode("\n", $output);
    preg_match_all('/^(.*?)\\.*: (.+)/m', $text, $matches);
    $n = count($matches[1]); // number of keyword/value pairs
    for ($i = 0; $i < $n; $i++) {
        // display next keyword/value pair:
        echo $matches[1][$i], "->", $matches[2][$i], "\n";
    }
}

Update 2

Instead of joining the array of lines returned by the exec command into a single string and doing preg_match_all, which will then give you an array of matches, it may be more convenient to do individual preg_match calls against the individual output lines from the exec command:

foreach ($query as $domain) {
    $scanUrl = 'whois '. $domain->url;
    $output = []; // start with an empty array
    exec($scanUrl, $output);
    foreach ($output as $line) {
         if (preg_match('/^(.*?)\\.*: (.+)/', $line, $matches)) {
             echo $matches[1], "->", $matches[2], "\n";
         }
    }    
}
Booboo
  • 38,656
  • 3
  • 37
  • 60
  • Thanks for the suggestion. But please bear with me. The array has multiple results of whois, how can I separate them? – coderv55 Mar 24 '20 at 08:38
  • It appears to me that you have a `foreach` loop in which you are doing *one* `whois` query the result of which goes into `$output`. It is in that loop that the code I suggest would go. The code I am suggesting takes that single result, which is an array where each element represents one line of output from `whois`, and reconstitutes a single string from that array of strings and then does a `preg_match_all` against that single string. – Booboo Mar 24 '20 at 10:22
  • I have updated the array in my first post to show the current complete `$output` I have multiple domains in a database table which I `whois` through with a `foreach` loop. I would like to separate them, so I can insert the "raw" result of each domains' `whois` into a database. – coderv55 Mar 24 '20 at 11:07
  • You have multiple outputs from `whois` in `$output` because each time you go through the loop you are appending to the `$output` array. Let me update the answer to make this more explicit. – Booboo Mar 24 '20 at 11:44
  • Thank you very much for taking the time to help me! This answer has helped me a ton! – coderv55 Mar 25 '20 at 08:57